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ABSTRACT 

Although  the  advent  of  fast  and  inexpensive  parallel  computers  has  ren¬ 
dered  numerous  previously  intractable  calculations  feasible,  many  numerical 
simulations  remain  too  resource-intensive  to  be  directly  inserted  in  engineer¬ 
ing  optimization  efforts.  An  attractive  alternative  to  direct  insertion  considers 
models  for  computational  systems:  the  expensive  simulation  is  evoked  only  to 
construct  and  validate  a  simplified  input-output  model;  this  simplified  input- 
output  model  then  serves  as  a  simulation  surrogate  in  subsequent  engineering 
optimization  studies.  We  present  here  a  simple  “Bayesian-validated”  statisti¬ 
cal  framework  for  the  construction,  validation,  and  purposive  application  of 
static  computer  simulation  surrogates.  As  an  example,  we  consider  dissipation- 
transport  optimization  of  laminar-flow  eddy-promoter  heat  exchangers:  paral¬ 
lel  spectral  element  Navier-Stokes  calculations  serve  to  construct  and  validate 
surrogates  for  the  flowrate  and  Nusselt  number;  these  surrogates  then  repre¬ 
sent  the  originating  Navier-Stokes  equations  in  the  ensuing  design  process. 
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ministration  under  NASA  Contract  No.  NASI- 13480  while  the  author  was  in  Engineering 
(1CASE),  NASA  Langley  Research  Center,  Hampton,  VA  23681. 
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1  Introduction 


Large-scale  numerical  calculation,  such  as  fluid  flow  simulation,  is  an  increas¬ 
ingly  significant  component  of  engineering  and  scientific  analysis.  However, 
despite  recent  advances  in  both  algorithms  and  architectures,  many  relevant 
individual  calculations  still  require  many  hours  of  expensive  supercomputer 
time.  As  a  result,  direct  insertion  of  these  resource-intensive  simulations  as 
“subroutine”  calls  in  particular  design  and  optimization  studies  is  typically 
not  viable.  First,  the  number  of  objective-function  evaluations  required  to 
find  an  optimal,  or  even  reasonable,  solution  to  an  optimization  problem  will 
not  be  known  a  priori.  Direct  insertion  of  expensive  simulations  may  ex¬ 
haust  allocated  resources  before  interesting  —  or  even  feasible  —  solutions 
are  obtained.  Second,  effective  engineering  design  and  optimization  processes 
are  evolutionary,  with  goals  and  constraints  continually  modified  to  reflect 
newly  available  information  or  specifications.  Direct  insertion  of  large-scale 
calculations  strongly  inhibits  adaptability:  with  each  revision  of  objectives, 
previous  computations  must  be  discarded,  and  a  new  sequence  of  expensive 
simulations  must  be  initiated.  Third,  the  value  of  expensive  numerical  simu¬ 
lations  can  be  greatly  enhanced  by  proper  incorporation  of  prior  information 
derived  from  collateral  analytical,  experimental,  or  heuristic  investigations. 
Direct  insertion  of  simulation  results  renders  model  fusion  and  validation  dif¬ 
ficult.  Fourth,  most  design  and  optimization  exercises  are  multidisciplinary 
in  nature  [lj,  involving  numerous  relatively  distinct  fields  of  physical  inquiry 
(e.g.,  fluid  mechanics,  solid  mechanics,  physical  chemistry).  Direct  insertion 
of  diverse  simulations  affords  little  opportunity  to  accomodate  —  or  exploit 
—  differing  degrees  of  complexity  and  sensitivity.  In  summary,  if  large-scale 
computation  is  to  graduate  from  analysis  to  synthesis ,  new  paradigms  are 
required. 

One  attractive  solution  to  the  simulation-integration  impasse  considers 
models  for  computational  systems :  the  expensive,  large-scale  simulation,  de¬ 
noted  M°,  is  evoked  only  to  construct  and  validate  a  simplified  compu¬ 
tational  model,  denoted  M\  this  simplified  model,  M ,  then  serves  as  an 
inexpensive  surrogate  for  M°  in  subsequent  engineering  applications.  The 
simplified  model  M  can  be  evaluated  effectively  ad  infinitum ,  can  support 
a  large  class  of  objective  functions,  can  readily  accomodate  extra-simulation 
information,  and  can  be  easily  incorporated  into  multidisciplinary  design 
studies.  The  application  of  models  for  computational  systems  does,  however, 
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raise  new  questions,  in  particular  as  regards  purposiveness:  to  what  extent  is 
the  design  procedure  misdirected,  or  proposed  designs  mispredicted,  by  the 
introduction  of  approximate  simulation  surrogates? 

In  this  paper  we  develop  —  and  apply  —  a  complete  surrogate  frame¬ 
work  for  optimization  based  on  the  simple  validation  concepts  presented  in 
[2].  More  broadly,  the  work  is  founded  upon  several  related  streams  of  in¬ 
quiry.  From  system  identification  (control)  theory  (3-6)  we  borrow  the  notion 
of  algorithmic  logical  empiricism,  in  which  available  data  is  systematically 
incorporated  into  the  model  construction  and  validation  processes;  from  the 
design  of  experiments  [7]  we  appreciate  the  need  for  sampling  heuristics  and 
response  surfaces;  from  statistical  prediction  rules  and  artificial  neural  net¬ 
works  [8-11]  we  adopt  the  concept  of  “construct  and  validate”  —  or  “train 
and  test”  —  data  partitions;  from  the  theory  of  machine  learning  [12,13]  we 
appropriate  the  “probably  approximately  correct”  framework;  from  Monte 
Carlo  methods  [14]  and  the  classical  equivalence  of  measure  and  probability 

[15]  we  derive  our  sampling  procedures;  from  nonparametric  statistical  theory 

[16]  we  deduce  our  statistical  error  estimates;  from  scattered -data  method¬ 
ology  [17]  we  derive  our  model-construction  procedures;  and  from  statistical 
quality-control  theory  [18,19]  we  adapt  relevant  a  posteriori  reliability  con¬ 
cepts.  Lastly,  our  work,  in  philosophy,  is  most  closely  aligned  to  earlier 
seminal  efforts  in  statistical  simulation  surrogates  [20-23],  in  which,  first, 
the  need  for  surrogates  is  motivated,  second,  the  special  role  of  statistical 
statements  is  recognized,  and  third,  the  idiosyncrasies  of  (largely  determinis¬ 
tic)  computer  experiments  are  identified;  other  “non-surrogate”  statistically 
motivated  approaches  to  the  incorporation  of  expensive  simulations  into  op¬ 
timization  studies  [24]  are  also  relevant  to  our  study. 

The  paper  is  organized  as  follows.  In  Section  2,  we  describe  the  op¬ 
timization  framework  in  which  surrogates  will  ultimately  be  applied,  dis¬ 
cuss  the  general  class  of  subproblems  for  which  our  surrogate  methods  are 
most  appropriate,  and  introduce  the  particular  laminar-flow  eddy-promoter 
heat  exchanger  optimization  study  that  will  serve  as  our  detailed  illustra¬ 
tion.  Finally,  we  reiterate  the  motivation  for  the  surrogate  approach,  for¬ 
mally  define  the  surrogate  problem,  and  describe  the  broad  methodolog¬ 
ical  guidelines  that  effective  surrogate  procedures  should  honor.  In  Sec¬ 
tion  3,  we  present  our  modelling  methodology,  treating  both  validation  and 
construction-validation;  the  algorithms  and  error  estimates  are  described, 
and  results  for  the  eddy-promoter  heat  exchanger  are  presented.  In  Section 
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4,  we  consider  the  incorporation  of  surrogate  techniques  into  the  full  opti¬ 
mization  framework,  with  particular  focus  on  purposiveness  and  a  posteriori 
analysis;  the  surrogate-based  optimization  approach  is  illustrated  for  the 
eddy-promoter  heat  exchanger  problem.  In  Section  5,  we  consider  several 
extensions  to  the  surrogate  methodology:  classification  maps;  databoards; 
and  multiple-output  estimates.  Lastly,  in  Section  6,  we  briefly  state  our  con¬ 
clusions.  (For  clarity  and  self-containedness,  *ve  include  here  some  material 
already  discussed  in  [2],  in  particular  as  regards  the  validation  procedure; 
however,  the  optimization  framework  is  new,  as  is  the  treatment  of  a  “real” 
application,  that  is,  a  problem  which  truly  requires  a  surrogate  approach.) 


2  General  Problem  Statement 

2.1  Optimization  Framework 

In  this  paper  we  develop  simulation  surrogate  technioues  designed  to  func¬ 
tion  as  part  of  a  larger  optimization  study:  we  therefore  require  a  general 
optimization  framework  in  which  to  interpret  our  results.  We  emphasize  that 
this  paper  is  not  concerned  with  more  classical  optimization  issues  such  as 
optimality  conditions  and  mathematical  programming  techniques  [25);  we 
assume  that  our  optimization  problems  are  well  posed,  and  that  procedures 
exist  to  find  at  least  local,  and  preferrably  global,  optima. 

We  first  introduce  a  bounded,  lower  semi-continuous  objective  function, 

*(g;A)  :  ft  x  A  —  fit,  (1) 

where  £  is  the  optimization  design  M-vector,  f 2  C  2RAf  is  the  admissible 
(closed)  domain  for  p,  or  “design  space,”  A  is  the  optimization  definition  N- 
vector,  and  A  C  1RS  is  the  admissible  domain  for  A.  The  A  vector  comprises 
coefficients  which,  as  regards  the  optimization  process,  may  be  treated  as 
parameters.  Our  minimization  problem  is  thus:  Find  $min(A),  £*(A)  such 
that 

*nun(A)  =  *(£-( A);  A)  <  *(£;  A)  VE  €  0  ,  (2) 

where  $min(A)  and  P*(A)  are  the  minimum  and  minimizer,  respectively.  We 
explicitly  introduce  the  dependence  of  the  minimum  and  minimizer  on  A  to 
underscore  that  our  optimization  problem  is,  in  fact,  a  family  of  optimization 
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problems  parametrized  by  A.  We  define  4>mjn(A)  as  a  global  minimum  to 
emphasize  that  our  surrogate  techniques  are  intended  to  serve  not  only  final, 
local  optimization  studies,  but  also  initial,  exploratory  design  efforts. 

We  are  interested  in  a  particular  class  of  optimization  problems  in  which 
the  objective  function  can  be  written  in  terms  of  a  subproblem, 

$(p;A)  =  4>(£(g);g;  A) .  (3) 

More  explicitly,  we  can  think  of  evaluating  $(g;  A)  as 


p€f) 


£(p)e£° 


seM* 


#(p;  A)  =  <f»(i;p;A) , 

where  p  is  the  subproblem  input  M-vector,  f)  is  the  subproblem  input  do¬ 
main,  s  €  IRk  is  the  subproblem  output  vector,  «£(p)  is  the  subproblem 
input-output  function,  and  L°°(Cl)  is  the  space  of  bounded  measurable  func¬ 
tions  over  the  domain  fi.  (We  believe  that  most  of  our  results  require  only 
that  £(g)  be  in  L°°{U)K ,  however,  ail  mathematics  in  this  paper  must  be 
considered  purely  formal  pending  complete  hypotheses  and  serious  proofs.) 

We  shall  further  assume  that  the  deterministic  subproblem  input-output 
function,  S{p) :  ]RM  — *  1 RK ,  is  expressed  as  a  functional,  J_,  applied  to  a  field 
U(x,f;g), 

£(E)=2(U(;  ■;£);£),  (4) 

where  U(x,  <;g)  satisfies  the  initial-boundary-value  field  subproblem. 


A£(U) 

U°(x;£). 


(5) 

(6) 


Here  J_  :  X  x  O  — ►  lRh  is  the  output  functional;  X  is  the  function  space  in 
which  the  field  subprobiem  solution  U(x,f;  g)  resides;  x  and  t  are  space  and 
time,  respectively;  Mp  and  Ap  are  deterministic  spatial  differential  operators 
(and  associated  boundary  conditions)  parametrized  by  the  input  vector  g; 
and  U°(x;g)  is  the  initial  condition  on  U(x,t;g). 

We  give  here  a  very  simple  illustration  from  incompressible  fluid  dynamics 
intended  to  render  the  abstract  subproblem  framework  more  comprehensible. 
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To  wit,  we  consider  the  drag  coefficient  for  flow  past  a  cylinder,  in  which 
we  identify:  p  as  a  single  input  (A/  =  1),  the  Reynolds  number:  Q,  the 
input  domain,  as  the  Reynolds-number  interval  of  interest;  s  as  a  single 
output  ( K  =  1),  the  drag  coefficient;  ^(g)  as  the  drag  coefficient-Revnolds 
number  relationship;  2  as  the  time-averaged  streamwise  component  of  the 
integral  of  the  stress-normal  product  over  the  cylinder  surface;  U(x,  t:p)  as 
the  {velocity,  pressure}  pair;  Mp  and  Ap  as  the  incompressible  Navier-Stokes 
system,  in  which  the  Reynolds  number  enters  as  a  parameter.  (Note  that,  for 
reasons  described  in  Section  2.2,  we  will  typically  not  be  interested  in  either 
time  or  space  as  an  input:  all  temporal  and  spatial  dependence  is  eliminated 
by  h  either  by  evaluating  the  subproblem  field  at  a  particular  point,  by 
averaging  over  time  or  space,  or  by  considering  properties  of  asymptotic, 
steady,  or  stationary  solutions.  For  this  simple  example,  we  perform  temporal 
and  spatial  averages  of  temporally  stationary  solutions.) 

We  make  four  final  remarks.  First,  we  have  equated  the  input  variables 
and  the  design  variables  (and  hence  the  input  space  and  the  design  space); 
more  generally,  the  input  variables  may  comprise  only  a  subset  of  the  design 
variables,  £,  but  may  also  include  certain  definition  variables,  A.  The  former 
would  be  fortunate,  reducing  the  size  of  the  subproblem;  the  latter  would  be 
unfortunate,  reducing  the  flexibility  of  a  single  surrogate  to  readily  address 
several  different  optimization  problems.  Second,  we  presume  that  the  field 
subproblem  must  be  solved  numerically,  but  that,  for  the  purposes  of  this 
paper,  all  numerical  errors  are  sufficiently  small  that  we  may  equate  the  nu¬ 
merical  and  exact  solutions.  Third,  as  shall  be  discussed  in  Section  2.2,  we 
shall  be  particularly  interested  in  subproblems  for  which  £(g)  is  computa¬ 
tionally  expensive  to  evaluate  (that  is,  the  field  subproblem  is  difficult);  we 
shall  assume,  however,  that,  once  s  =  £(£)  is  known,  the  objective  function 
$(£;  A)  can  be  inexpensively  evaluated  as  <£(i;£’,  A).  Fourth,  for  the  optimiza¬ 
tion  problems  we  consider,  the  computational  complexity  —  and  the  greatest 
opportunity  for  improved  efficiency  —  resides  not  in  the  search  process,  but 
in  the  objective  function  evaluation. 

2.2  Class  of  Subproblems  of  Interest 

The  surrogate  approach  is  particularly  appropriate  for  optimization  problems 
in  which  the  subproblem  satisfes  the  following  three  “complexity  conditions.” 
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Condition  Cl:  The  £[p)  are  expensive  to  evaluate  in  terms  of  computer  costs, 
elapsed  time,  or  human  effort.  This  condition  may  appear  to  be  transitory, 
given  the  continual  and  rapid  decrease  in  computational  times  and  costs  [26]. 
We  claim,  however,  that  as  computational  capacity  increases,  problem  size 
will  also  grow  to  accomodate:  increased  (perhaps  finally  adequate)  resolu¬ 
tion;  higher  fidelity  mathematical  models;  increased  physical  complexity  of 
new  technologies.  □ 

Condition  C2:  Sharp  regularity  information  on  the  iS(p),  such  as  a  Lipschitz 
condition, 

!£(&)  -£(£,)!  <  cL|£2  -£,1 

is  difficult  to  obtain  and  problem-specific.  This  implies,  in  effect,  that  very 
little  regularity  can  be  assumed  of  the  input-output  function  <£(/?).□ 

Condition  C3:  Knowledge  of  ^(g)  at  one  input  value,  py,  is  of  minimal  com¬ 
putational  value  in  evaluation  of  <S(g)  at  a  second  input  value,  g2:  subproblem 
evaluation  enjoys  no  computational  economy  of  scales.  This  condition  pre¬ 
cludes  certain  {subproblem;  input;  diagnostic}  triples,  such  as  {  •  ,  time;  •  }. 
and  {steady  Navier-Stokes;  Reynolds  number;  Newton  continuation).  In 
both  these  cases  —  assuming  sufficient  regularity  —  later  calculations  can 
exploit  earlier  results  in  order  to  reduce  computational  effort.  (As  we  are 
not  considering  time  as  an  input,  our  surrogates  are  “static;”  this  does  not 
imply,  of  course,  that  the  field  subproblem  involves  only  steady  phenomena, 
nor  that  the  outputs  may  not  contribute  to  a  time-dependent  model.  )□ 

In  colloquial  terms,  for  a  subproblem  which  satisfies  these  three  conditions: 
we  can  not  afford  to  generate  subproblem  solutions  for  many  different  input 
values  (CI,C3);  we  can  not  consider  the  subproblem  solutions  at  a  few  input 
values  to  be  representative  of  the  entire  input  space  (C2);  we  can  not  sim¬ 
plify  subproblem  evaluations  by  exploiting  special  features  (such  as  locality) 
of  the  parent  optimization  problem  (C2,C3).  We  believe  that,  unfortunately, 
there  are  many  problems  which  approximately  satisfy  these  conditions. 

Remark  on  Physical  Experiments.  It  is  of  interest  to  ask  why  our  meth¬ 
ods  are  not  as  appropriate  for  experimental  investigations  as  for  computa¬ 
tional  systems,  and,  conversely,  why  experimental  data  analysis  techniques 


are  not  directly  appropriate  for  computational  systems.  Considering  the 
former,  first,  quite  apart  from  noise,  physical  (in  particular  continuum  me¬ 
chanics)  experiments  often  do  not  satisfy  condition  C3;  for  example,  once 
an  expensive  flow  apparatus  is  configured,  there  are  great  economies  of  scale 
in  (in  fact,  opportunity  costs  incurred  in  not)  obtaining  data  for  a  large 
number  of  flowrates,  rather  than  just  a  few.  This  is  because  much  experi¬ 
mental  equipment  is  problem-specific,  amortized  over  only  a  particular  class 
of  inquiries,  and  because  elapsed  time  is  not  a  serious  consideration  in  many 
(though  not  all)  laboratory  environments.  Second,  despite  recent  advances  in 
transducers  and  imaging,  experimental  data  at  a  particular  input  value,  £',  is 
already  greatly  reduced  with  respect  to  analogous  numerical  data;  whereas 
raw  simulation  data  resident  on  a  databoard  (see  [2]  and  Section  5)  can 
be  subsequently  processed  to  produce  a  wide  range  of  different  outputs,  ex¬ 
perimental  data  can  serve  only  those  applications  requiring  the  few  outputs 
selected  in  the  originating  investigation. 

Turning  now  to  why  well-developed  experimental  data  analysis  techniques 
are  not  directly  applicable  to  simulations,  first,  many  experimental  inquiries 
assume  significant  noise  levels.  In  contrast,  although  computational  inquiries 
do  contain  difficult-to-quantify  factors  (e.g.,  resolution,  incomplete  iteration) 
that  may  perhaps  be  gainfully  interpreted  as  noise,  these  factors  tend  to  be 
both  relatively  small  and  largely  controllable.  Many  experiment-design  tech¬ 
niques  developed  to  reduce  or  understand  uncontrolled  factors  (e.g.,  blocking 
and  randomization  [7])  are  thus  largely  irrelevant  in  the  computational  arena 
[21].  Second,  many  experimental  surrogate  (e.g.,  response  surface)  methods 
are  premised  upon  assumptions  of  both  smoothness  and  locality  (e.g.,  linear 
models,  fractional  factorial  designs  [7]);  although  these  assumptions  may  be 
necessary  for  noisy  experiments,  deterministic  simulations  can  benefit  from 
less  restrictive  hypotheses.  □ 

2.3  Eddy-Promoter  Heat  Exchanger  Example 

2.3.1  The  Optimization  Problem 

As  our  physical  problem  we  consider  two-dimensional  laminar  flow  and  con¬ 
vective  heat  transfer  in  the  eddy-promoter  heat  exchanger  shown  in  Figure 
1.  In  overview,  the  eddy-promoter  channel  comprises  a  two-dimensional 
(infinite  in  13)  plane  channel  with  plate  separation  (in  x?)  2 h,  geometrically 
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interrupted  by  an  infinite  period'--  auay  of  insulating  cylindrical  eddy  pro¬ 
moters  of  bottom  wall-to-cylinder  spacing  a,  radius  R ,  and  pitch  (cylinder- 
to-cylinder  ii -separation)  L.  Heat  enters  the  channel  through  an  isother¬ 
mal  bottom  wall  maintained  at  temperature  T\  (representing,  for  exam¬ 
ple,  a  highly  conducting  plate  housing  electronic  components)  and  leaves 
the  channel  through  an  isothermal  “cold”  top  wall  maintained  at  tempera¬ 
ture  T0.  A  fluid  flow  driven  by  an  imposed  pressure  gradient,  dp/di^x  — 
and  excited  to  significant  cross-stream  transport  by  the  eddy-promoters  — 
serves  to  reduce  the  bottom  wall-to-top  wall  thermal  resistance.  We  are 
interested  in  determining  that  eddy-promoter  placement  and  radius  which 
minimizes  pumping  power  (e.g..  operating  cost)  and  eddy  promoter  volume 
(e.g.,  materials  cost)  while  simultaneously  maintaining  a  temporally  and  spa¬ 
tially  averaged  bottom-wall  heat  flux  (e.g.,  electronic  component  density). 
<  F  >,  not  much  less  than  <  F  >nom.  In  more  quantitative  terms,  we 
wish  to:  minimize  the  pumping  power  +  fa  x  a  penalty  if 

(<  F  > nom  -  <  F  >)/  <  F  >nom  >  0;  with  respect  to  eddy- promoter 
placement  a  and  radius  R;  for  various  objective-function  weights  and 

and  thermal  loads  <  F  >norn. 

(Notational  aside:  dimensional  variables  shall  carry  carats;  length,  veloc¬ 
ity,  and  time  will  be  nondimensionalized  with  respect  to  h.  dp/ dixh2 /2pu, 
and  2pv / dp] di\h,  respectively;  temperature  will  be  measured  relative  to  f0 
and  nondimensionalized  with  respect  to  AT  =  T\  —  To-  The  incompressible 
working  fluid  is  characterized  by  a  constant  density,  p,  kinematic  viscosity,  0. 
thermal  conductivity,  k,  and  thermal  diffusivity,  or.  The  domain  associated 
with  one  periodicity  length  of  the  channel  (arbitrarily  positioned  as  shown 
in  Figure  1)  will  be  denoted  D,  with  the  bottom  and  top  walls  denoted  Bx 
and  Bq ,  respectively,  and  the  eddy-promoter  surface  denoted  Be-  A  generic 
point  in  D  is  (ix , )'>  the  fluid  velocity,  pressure  (perturbation  from  the  im¬ 
posed  linear  field),  and  temperature  are  written  as  u  =  u,ei  +  i^e?,  p.  and 
7\  respectively,  where  are  the  unit  vectors  associated  with  the  two 

coordinate  directions.  Angular  brackets,  <  •  >,  refer  to  time  average  with 
respect  to  a  temporally  stationary  state.) 

We  can  pose  the  (nondimensionai)  eddv-promoter  heat  exchanger  opti¬ 
mization  problem  a*  an  instantiation  of  the  general  framework  of  Section 
2.1:  $,<£  *-+  <&EP,0EP;  M  k-+  A/ep  =  (5)  2,  the  number  of  design  variables; 
l  ~  £ep  =  {(Re),(Pr),  a,  /?,(£,)};  H  --  QEP  =  {.1  <  a  <  1.  .05  <  R  < 
a  —  .05}  (see  Figure  2);  N  NtP  ~  4,  the  number  of  definition  variables; 
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A  •— *  AEP  =  *};  A  AEP  =  HQ.  Here  Re  and  Pr  denote  the 

Reynolds  number  and  Prandtl  number,  defined  as  Re  =  —  dgj  dxhp  (2pi2  and 
Pr  =  i>ja,  respectively,  and  k  =<  F  >nom  2h/kAT  is  the  nondimensional 
thermal  load.  Note  that  for  the  purposes  of  our  optimization  problem  the 
Reynolds  number,  Prandtl  number,  and  eddy-promoter  pitch  are  fixed  at 
Re  =  300,  Pr  =  1.  and  L  —  6.666,  respectively,  leaving  only  two  design 
variables,  the  nondimensional  eddy  promoter  placement,  a,  and  radius,  R: 
we  have  indicated  parenthetically  that  a  more  general  optimization  problem 
might  involve  five  design  variables,  in  which  Re,  Pr,  and  L  are  also  free  to 
vary. 

As  our  objective  function  we  take 

$EP(a,R;AEP)  =  <6EP(£(a,  R),  C(a,  R);  R,  (Re);  AEP )  (7) 

where 

4>EP(Sf,g;  R,{Re)\i3i,!3-2,j33,K)  =  3X {Re)2g  +  &R2  +  i3zH{  1  -  q/K)  ,  (8) 

Here  g  —  Q(a,  R)  is  the  time-averaged  flowrate  through  the  channel, 

Q{a,  R)  —  J^<  u\  >  dx  ,  (9) 

and  q  =  Q(a,  R)  is  the  time-averaged  Nusselt  number, 

Q{a,R)  =<  F  >2h/k&T  .  (10) 

From  <  F  >=  i  /gj  <  — >  dii,  it  follows  that 


It  is  readily  shown  that,  as  required,  ( Re)2g  is  the  pumping  power  per  peri¬ 
odicity  length  per  unit  depth  (nondimensionalized  by  4 pvz/h3),  and  1  —  q/K 
is  (<  F  >nom  -  <  F  >)/  <  F  >nom-  The  penalty  function  'H(z)  is  chosen 
to  be  H(z)  =  H(z)z2,  where  H{z)  is  the  Heaviside  distribution.  It  is  crit¬ 
ical  to  note  that,  even  in  a  real  (not  just  illustrative)  design  exercise,  the 
particular,  initial,  form  for  the  objective  function  is  not  overly  important, 
as  surrogate  techniques  are  designed  to  permit  significant  variation  in  the 
objective  function  at  low  marginal  cost.  Indeed,  it  is  often  only  through  ob¬ 
serving  optima  and  varying  the  objective  function  that  design-goal  intuition 
is  clearly  articulated. 
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2-3.2  The  Eddy-Promoter  Subproblem 

We  next  identify  our  eddy-promoter  subproblem  with  the  general  subprob¬ 
lem  described  in  Section  2.1:  M  >  Mtr  =  (5)  2,  the  number  of  inputs;  g  >— > 
g”  =  {(fie),  (fir), a,  /?,(£)};  -  H"  =  {.1  <  a  <  1,  .05  <  fi  <  a  -  05); 

K  Ktr  =  2,  the  number  of  outputs;  s  £lp  =  f  <?,<?};  jS(g)  *-♦  j£tf>(g)  = 

{£(gK Q(g)};  1*-*  {^i Id  <  u i  >  dx<r/s1  <  >  *»}; u  *-"•  {u.p'.rj; 

Mp,  Ap  »-+  Navier-Stokes  and  forced  convection.  More  explicitly,  the  Navier- 
Stokes  field  subproblem  for  (u(xi,  i2,  t),p'(xt,  i2,  t))  is  given  by. 


du,  du, 

jr  +  Ujik~ 

du, 

dx, 

u 

(U,p')(x,  +  mL,  xj,t) 


dp'  1  aJu, 


^  _2 

Redijdxj  Re  a 


dx, 

0  in  D 

0  on  £?o  U  B\  U  Be 


in  D  (12) 


(u,p')(x,,x2,t)  Vm  6  Z  , 
and  the  forced -convection  energy  equation  for  T(xlfx2,t)  j  given  by, 


dT 


OT 


dt  +  Uj  dx3 
T 
dT 

dn 

T{ii  +  ml,x2,t) 


1 


d2r 


in  D 


fiefir  dijdxj 
0(1)  on  B0  ( B\ ) 

0  on  Be 

r(x,,x2,t)  Vm  e  Z 


(13) 

(14) 

(15) 


(16) 

(17) 

(18) 
(19) 


Here  6i}  is  the  Kronecker-delta  symbol,  d/dn  denotes  differentiation  in  the 
direction  of  the  boundary  normal,  Z  is  the  set  of  integers,  fie  =  300  and  Pr  = 
1,  free  indices  range  over  {1,2),  and  summation  (£?)  over  repeated  indices  is 
assumed.  Initial  conditions  are  ,;ot  important  since,  at  this  Reynolds  number, 
only  a  single  attractor  appears  to  exist  for  all  {a,  fi)  €  Qtp 

We  solve  the  Navier-Stokes  and  energy  equations  with  the  NEKTON  code 
on  the  Intel  iPSC/860  multiprocessor.  The  numerical  method  comprises: 
fractional  timestepping  schemes  [26-28];  spectral  element  spatial  discretiza¬ 
tions  [29];  and  parallel  [26,30]  deflated  [31]  multilevel  conjugate  gradient  [28] 
iterative  solution  procedures.  A  typical  calculation  proceeds  by:  specification 
of  {a,  fi};  automatic  spectral  element  mesh  generation  from  several  skeletal 


templates;  automatic  parallel  partition;  integration  in  time  until  a  temporally 
stationary  state  is  achieved;  evaluation  of  the  requisite  output  functionals. 
The  spectral  element  templates  are  constructed  to  permit  relatively  undis¬ 
torted  meshes  and  adequate  resolution  even  for  extreme  cylinder  placements 
and  sizes.  We  have  confirmed  the  mesh  independence  of  our  calculations 
[32],  and  have  verified  our  results,  where  possible,  with  other  numerical  pre¬ 
dictions  and  experiment  [33-37],  For  a  typical  {a,R}  6  QEP,  roughly  S75 
and  6  16-processor  hours  are  needed  to  reach  the  steady  or  steady-periodic 
state  required  to  evaluate  the  flowrate  and  Nusselt  number;  note  that  each 
subproblem  evaluation  would  cost  as  much  as  $750  on  a  single- processor 
supercomputer  [26]. 

2.3.3  Flow  Phenomena 

We  aim  to  find  ^SS„{A)  and  £EP’(AEP)  =  (a*(AEP),  /J'(AEP)}  such  that,  V{a, /?} 
€  *  «E,*(a*(A"),/?*(AEP);A£P)  <  *EP(a,  R;  AEP).  We  empha¬ 

size  that  this  design  problem  is  not  local:  the  triangular  admissible  design 
space,  fiEP,  shown  in  Figure  2,  admits  virtually  all  geometrically  possible 
cylinder  placements  and  radii.  More  importantly,  different  flow  phenomena 
occur  in  different  regions  of  the  design  space.  Figure  3  depicts  the  isotherms 
for  three  representative  {a,/?}  points  in  0";  for  a  sss  .5  unsteady  steady- 
periodic  supercritical  Tollmien-Schlichting  wali-mode  channel  waves  obtain 
(Figure  3a);  for  large  R  steady  wavy  flows  predominate  (Figure  3b);  for  very 
large  R  the  flow  is  effectively  blocked  (Figure  3c)  [32].  (Recall  that  the 
Navier-Stokes  calculation  is  at  fixed  pressure  gradient,  not  fixed  flowrate.) 

This  paper  is  not  concerned  with  the  details  of  eddy-promoter  flows  ex¬ 
cept  to  the  extent  that  these  details  illustrate  essential  aspects  of  the  surro¬ 
gate  procedure.  Readers  interested  in  more  details  on  dissipation-transport 
optimization  of  eddy-promoter  systems  are  referred  to  [32,34-36].  These 
studies  treat  more  realistic  boundary  conditions  (e.g.,  in  which  the  heat  is 
carried  away  by  the  fluid  flow)  and  optimization  objective  functions,  and  ad¬ 
dress  a  wider  range  of  both  laminar  flows  (numerically  and  experimentally) 
and  transitional  and  turbulent  flows  (experimentally).  However,  extensive 
optimization  with  respect  to  geometric  inputs  (e.g.,  a  and  R)  has  not  been 
undertaken,  due  to  the  expense  (in  this  case,  in  both  the  numerical  and 
experimental  contexts)  associated  with  system  modification  and  re-analysis. 
Readers  interested  in  more  details  on  the  flow  physics ,  in  particular  the  hydro- 


11 


dynamic  stability,  of  eddy-promoter  (and  related)  flows  may  consult  [32,35- 
38];  these  papers  interpret  eddy-promoter  bifurcations  as  the  interaction  of 
simpler-geometry  shear,  cylinder,  and  channel  instabilities.  Recent  quiet  ex¬ 
periments  and  inflow-outflow  numerical  simulations  [37]  indicate  that  the 
initial  instability  is  convective,  not  absolute;  however,  noisier  experiments 
[36]  agree  quite  well  with  periodic  calculations,  suggesting  that  in  engineer¬ 
ing  applications  the  assumption  of  spatial  periodicity  may  be  acceptable  for 
sufficiently  long  channels. 

We  claim  the  eddy-promoter  subproblem  satisfies  the  three  conditions  of 
Section  2.2:  the  computation  is  expensive  and  time-consuming  (Cl);  bifur¬ 
cations  preclude  sharp  regularity  estimates  (C2);  complex  time-dependence 
and  geometry  variation  precludes  continuation  methods  (C3). 

2.4  The  Surrogate  Approach 

In  the  '‘direct  insertion”  approach  to  simulation-based  optimization,  the  ob¬ 
jective  function  4>(g,  A)  is  evaluated,  for  each  candidate  g,  as  <$>(y,  g;  A),  where 

£€ftM-£>U(x,f;g)  jLs£]Rk  . 

> - 

£(p> 

The  disadvantages  of  this  approach  are  described  in  the  Introduction:  the 
number  of  evaluations  of  $(g,  A)  is  not  known  a  priori  —  resources  may  be 
exhausted  before  a  sensible  design  is  proposed;  simulations  evoked  in  a  first 
optimization  study  with  objective  function  $(p,  A1)  will  be  of  limited  use  in  a 
second  optimization  study  with  objective  function  $(p,  A2)  —  design  adapt¬ 
ability  is  frustrated;  and  systematic  fusion  of  simulation  results  with  prior 
analytical,  heuristic,  or  experimental  information  is,  at  best,  difficult.  In 
short,  it  is  difficult  to  perceive  of  a  day-long  thousand-dollar  Navier-Stokes 
simulation  as  a  function  call  from  a  mathematical  programming  routine. 

In  the  surrogate  alternative,  the  subproblem  simulation  is  evoked  only 
to  construct  and  validate  a  simplified  input-output  model,  *£(p):  ft  — ►  IRh , 
which  is  intended  to  approximate  <2(p)  over  the  input  domain  ft.  This  sim¬ 
plified  input-output  model  then  serves  as  a  simulation  surrogate  in  subse¬ 
quent  optimization  studies,  that  is,  $(g,A)  =  <^(«2(g);p;  A)  is  replaced  with 
<f>(£(p);p;A)  =  $(p,  A):  $(p,  A),  not  $(p,  A),  is  minimized.  Given  the  as¬ 
sumptions  of  Section  2.1,  $(g,A)  can  be  inexpensively  evaluated  for  any 
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candidate  £  as  <£(?;£;  A),  where 


£  €  fi  =2  l  €  J2*  . 

The  surrogate  problem  can  thus  be  stated  as  follows:  Given  a  limited  (or 
even  fixed)  number  of  appeals  (recall  conditions  Cl  and  C3)  to  a  largely 
uncharacterized  (recall  condition  C2)  but  deterministic  function,  <£(£);  Find 
a  simple  but  validated  approximation  to  «£(p)  over  H,  <£(p),  which  a)  con¬ 
servatively  but  effectively  exploits  prior  information,  and  b)  can  be  gainfully 
incorporated  into  design  and  optimization  studies. 

The  advantages  of  surrogates  are  manifold:  surrogates  are.  by  construc¬ 
tion,  inexpensive,  and  can  be  evaluated  ad  infinitum  —  premature  termina¬ 
tion  of  the  optimization  procedure  will  not  be  required;  a  single  surrogate 
can  support  a  large  A-family  of  related  objective  functions  —  adaptive,  non- 
incremental  modification  of  design  criteria  and  specifications  is  encouraged; 
surrogates  can  readily  incorporate  prior  extra-numerical  information  con¬ 
cerning  not  only  regularity,  but  also  form  —  thereby  reducing  the  computa¬ 
tional  burden.  The  primary  disadvantage  of  surrogates  is  the  introduction  of 
new  errors  into  the  optimization  process  due  to  the  additional  level  of  approx¬ 
imation,  <£(£(£);£;  A)  %  d>(jS(£);£;  A);  this  “purposiveness”  issue  is  discussed 
in  depth  in  Section  4.  The  many  advantages  (and  significant  disadvantage) 
of  surrogates  have  long  been  recognized:  computational  scientists  typically 
search  for  “insight  not  numbers”;  engineers  often  exploit  reduced-order  mod¬ 
els.  However,  with  the  exception  of  relatively  recent  work  (20-23):  the  sur¬ 
rogate  concept  is  rarely  explicitly  articulated;  application  of  the  surrogate 
concept  is  typically  ad  hoc ;  and  surrogates  are  not  usually  accompanied  by 
useful  error  estimates.  It  is  the  latter  shortcomings  that  we  aim  to  partially 
mitigate. 

The  surrogate  problem  statement  and  the  complexity  conditions  Cl,  C2, 
and  C3  suggest  several  broad  methodological  guidelines.  First,  from  condi¬ 
tions  Cl  and  C3,  we  require  error  estimates  for  a  fixed  number  of  function 
evaluations;  this  implicates  a  statistical  approach,  in  which  uncertainty  can 
be  precisely  accomodated.  Second,  from  condition  C2,  we  must  presume  that, 
in  the  general  case,  we  know  relatively  little  about  our  input-output  func¬ 
tion;  this  implicates  a  nonparametric  statistical  approach.  Third,  from  con¬ 
ditions  Cl  and  C2,  design  sensitivity  derivatives,  though  a  powerful  tool  for 
both  gradient-based  minimization  and  post-optimization  sensitivity  analyses 
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[1,39,40],  may  not  be  sufficient:  more  global,  general  models  for  objective- 
function  approximation  must  be  admitted.  Fourth,  from  conditions  Cl,  C2, 
and  C3,  we  deduce  that  neither  a  priori  nor  a  posteriori  regularity-based 
approximation  and  estimation  techniques  are  effective:  we  can  not  hope  to 
be  asymptotic  (Cl);  we  will  have  very  little  insight  into  the  proper  norm, 
form,  or  constants  of  approximation  errors  (C2);  a  posteriori  error  analysis 
based  on  local  subproblems  or  extrapolation  will  not  be  computationally  vi¬ 
able  (C3).  These  considerations  suggest  that  new  error  norms  are  required. 
Fifth,  from  condition  C3,  we  can  assume  at  least  partial  decoupling  of  the 
parent  optimization  problem  and  the  expensive  subproblem;  although  results 
of  the  former  will  certainly  affect  the  region  in  which  we  choose  to  examine  the 
latter,  the  two  tasks  remain  computationally  relatively  independent.  Armed 
with  this  methodological  outline,  we  now  proceed  to  discuss  the  particular 
algorithms  developed. 

Remark  on  Modelling.  We  assume  here  that  our  mathematical  model, 
M°,  accurately  reflects  the  physical  problem  of  interest,  denoted  M°°.  Our 
computational  surrogate  approach  implicitly  considers  the  modelling  pro¬ 
cess  in  two  stages;  M°°  — +  M°  — ►  M.  An  alternative,  one-stage,  ap¬ 
proach,  A400  — ♦  M,  proceeds  directly  from  the  physical  problem,  ,M°°,  to 
a  computationally  simple  engineering  model,  M,  without  passing  through  a 
large-scale-simulation  intermediary,  M°  (or  physical  experiment).  We  be¬ 
lieve  the  two-stage  approach  is  preferrable,  as  the  more  difficult  problem  of 
physical-to-mathematical  translation  is  conducted  at  a  level  which  accomo¬ 
dates  greater  complexity.  This  greater  flexibility  should  not,  of  course,  serve 
as  an  excuse  for  less  discriminating  modelling  practices. □ 

3  Modelling  Methodology 

In  this  section  we  consider  both  validation  procedures,  in  which  we  assess  a 
given  <£(g),  and  construction-validation  procedures,  in  which  we  both  pro¬ 
pose  and  assess  j£(p).  As  much  of  [2]  is  focussed  on  the  motivation,  analysis, 
and  empirical  verification  of  the  validation  and  construction-validation  al¬ 
gorithms,  we  confine  ourselves  here  to  a  brief  summary  of  the  major  points. 
We  restrict  ourselves  initially  to  a  single  output,  K  —  1;  extension  of  the 
theory  to  multiple  outputs  is  described  in  Section  5. 
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3.1  Validation 

3.1.1  Algorithm 

The  validation  algorithm  takes  as  given:  (i)  A  subproblem,  with  an  input 
M-vector,  £,  an  input  domain,  Q  €  2RM,  a  (single)  output,  s  €  iR,  and  an 
input-output  function,  <S(£)  €  L°°{ Q).  We  prefer,  but  do  not  require,  that 
the  subproblem  satisfy  the  three  complexity  conditions  Cl,  C2,  and  C3,  of 
Section  2.2.  (ii)  A  proposed  surrogate,  <S(£)  :  0  — ►  2R.  (iii)  A  strictly  positive 
Bayesian  importance  function, 

p(£):ft  R+,  j^p(£)d£=l,  (20) 

which  describes  the  a  priori  relative  importance  ascribed  to  different  points 
within  the  input  domain  fl.  As  will  be  discussed  in  Section  4,  this  impor¬ 
tance  function  is  best  interpreted  as  a  prior  “density”  for  £*( A).  We  note  that, 
for  the  error  statements  developed  in  Section  3.1.2,  the  importance  function 
is  required  to  ensure  input-transformation  objectivity,  (iv)  The  maximum 
number  of  S(p)  evaluations  permitted,  Nev,  This  parameter  describes  the 
resource  limitation  associated  with  the  validation  exercise,  (v)  Two  valida¬ 
tion  error  tolerances,  €\,£i  €  [0, 1]J,  the  significance  of  which  will  become 
clear  in  the  validation  error  statement  of  Section  3.1.2. 

We  now  summarize  the  simple  Monte-Carlo  [14]  Model  Validation  (MV) 
Algorithm  of  [2].  We  note  that  this  algorithm  is,  in  effect,  nothing  more  than 
randomization  of  obvious  parameter-space  exploration  procedures;  however, 
the  introduction  of  a  probabilistic  framework  permits  an  error  statement 
which  marries  well  with  subsequent  optimization  analyses. 

Algorithm  MV(5(£),  <S(p),  p(p),  iVev, £i , e2) 

1.  SET  Nva  =  tf{eue2). 


W(£i,  £2) 


lnc2 

In(l-ei)  ' 


(21) 


2.  IF  Nva  >  iVev,  validation  is  not  possible. 

3.  FOR j  =  l,...,Nva 

51.  DRAW  £,  ~  p(£) 

52,  COMPUTE  S}  =  S(£;). 
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4.  SET  £max  =  max;6{i . jv-*}  E:,  where  E}  —  |5;  -  5(£,)|.  □ 


Here  X_  ~  f(x)  refers  to  a  random  vector  X_  with  probability  density  function 
/( i);  we  shall  indicate  a  particular  realization  of  a  random  quantity  Y  (vari¬ 
able,  vector,  or  domain)  as  SfcK.  In  the  MV  Algorithm,  the  P_}  are  random 
vectors,  and  the  S:,  £,,  and  Emax  are  random  variables. 


3.1.2  Error  Analysis 

The  output  of  the  MV  Algorithm,  the  model  prediction  error  estimator, 
EmAX,  is  related  to  the  model  validation  error  estimate,  and  the  validation 
statement  uncertainty,  £2,  by  a  precise  probabilistic  statement. 


Pr 


</ 


{£€n||5(p)-5(E)i<£m»,} 


p{p)dp  >  1  -ti}  >  1  -e2 


(22) 


where  Pr  {event}  is  the  probability  that  event  occurs.  In  words,  (22)  states 
that,  with  probability  greater  than  or  equal  to  I  —  £2,  |5(p)  —  «S(p)|  <  Emiix 
over  a  region  of  fl  of  relative  weighted  volume  greater  than  or  equal  to  l—£\\ 
equivalently,  with  probability  greater  than  or  equal  to  1  —  e2,  |5(p)  —  5(g)|  > 
Etxukx  over  a  region  of  Si  of  relative  weighted  volume  no  greater  than  £j. 
(The  probability  ensemble  here  is  defined  with  respect  to  repetition  of  the 
algorithm:  we  expect  that,  in  greater  than  1  —  e2  of  all  realizations  of  the 
algorithm,  jS(p)—  S(p)\  <  Ema*  over  a  region  of  fi  of  weighted  relative  volume 
greater  than  1  —  £j.) 

The  critical  aspects  of  the  validation  procedure  are:  first,  a  precise  (al¬ 
beit  probabilistic)  error  statement,  (22),  can  be  made  for  a  fixed  number 
of  evaluations  of  «I>(p);  second,  the  sample-size  requirement,  (21),  and  re¬ 
sulting  error  estimate,  (22),  are  nonparametric,  valid  for  any  functions  S(p) 
and  S(p );  third,  the  validation  statement,  (22),  requires  no  assumptions  on 
<!>(£)  or  <S(g)  as  to  regularity  or  functional  form.  Perhaps  most  importantly, 
the  error  estimate  will  also  prove  amenable  to  a  posteriori  analysis  in  the 
optimization  context  (see  Section  4).  The  error  statement  (22)  is  readily 
interpreted  in  the  probably  approximately  correct  framework  developed  for 
classification  problems  in  the  theory  of  learning  [12,13];  in  the  probably- 
approximately-correct  context,  finite  uncertainty  —  in  our  case  represented 
by  and  e2  —  permits  a  precise  statement  for  a  fixed  sample  size. 
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The  derivation  of  (22)  is  given  in  [2]  in  terms  of  order-statistic  tolerance 
limits  [16,41(42].  We  indicate  here  an  alternative  derivation,  based  on  bino¬ 
mial  considerations,  that  has  the  advantage  of  direct  (multinomial)  extension 
to  the  multiple-output  case  (see  Section  5).  We  define  FE  to  be  the  (perforce 
increasing,  though  not  necessarily  strictly  increasing,  nor  continuous)  cumu¬ 
lative  distribution  function  of  the  random  variable  E  —  ]5(£)  — 5(£)[,  where 
£  is  a  random  vector  with  probability  density  />(g).  The  1  -  quantile  of 
E ,  ej_ei,  is  then  defined  by  )  *=  1  “  £i  (more  precisely,  ei_tI  is  the 

minimum  z  such  that  Fg(z)  >  1  -  £i).  Lastly,  jmMX  is  any  j  6  {1, . . . ,  Nva} 
for  which  Ej  =■  Emtx.  Then,  if  7J.*,  =  {£  6  ft  j  1 5(g)  -  5(g)|  <  ei_fl }, 

Pr{Vj  €  {1 . iV“},  £,6  7;..,  }  <  (1  -£,)A'“  .  (23) 

It  follows  that  at  least  one  £,}  will  lie  in  ft  \7i-tl  s*  {g  €  ft  1 1 5(g)  —  5(g) |  > 
et-«i  }  wi*h  probability  greater  than  or  equal  to  1  -(1  —  £1)^”*.  Furthermore, 
if  at  least  one  lies  in  0  \  7i_fl ,  then,  since  Em „  >  Vj  €  {1, . . . ,  }, 

£jm„  particular  will  lie  in  ft  and  thus  Fe(E „«,)  >  1  -ex.  Finally, 

recognizing  FE{t)  -  /{p€n||,s<E)-.S(E)|<e}^E)  db  and  substituting  from  (21) 

(1—  £i)nv‘  =  £2,  we  obtain  (22).  This  binomial  derivation  of  (22)  is  essentially 
a  classification  argument;  not  surprisingly,  the  sample-size  requirement  (21) 
also  appears  in  [13]. 

The  origin  of  uncertainty  in  (22)  is  a  random  region 
U  =  (g  6  ft  |  |5(g)  -  5(g)|  >  £„**}  , 

1  —  £3  probably  of  relative  weighted  volume  less  than  or  equal  to  £j,  of  unde¬ 
termined  location  and  shape,  over  which  the  surrogate  misfit,  |5(g)  -  5(g) |, 
is  unknown.  The  usual  confidence  interval-confidence  level  balance  inherent 
in  (21)  has  an  interesting  interpretation:  if  we  consider  -  ln£2  to  be  how  well 
we  know  the  simulation  behavior,  and  — 1/ in(l  -  £j)  to  be  how  much  of  the 
simulation  behavior  we  know,  then,  for  a  fixed  number  of  appeals  to  5(g), 
JVvo,  (21)  implies  that  the  product  of  how  well  and  how  much  is  fixed  (in  fact, 
equal  to  Nva).  If  we  choose  the  deterministic  limit,  in  which  we  tolerate  no 
uncertainty  (£2  — *  0),  then  how  well  we  know  the  simulation  behavior  tends 
to  infinity,  but  how  much  of  the  simulation  behavior  we  know  tends  to  zero: 
we  know  the  simulation  behavior  only  at  the  points  sampled,  which  is  a  set 
of  measure  zero.  By  permitting  finite  but  controlled  uncertainty  in  how  well , 
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surrogates  accomodate  a  finite  how  much:  surrogates  -  probable  but  global 
—  bridge  the  gap  between  direct  computation,  which  is  sure  but  pointwise, 
and  analytical  methods,  which  are  sure  and  global. 

The  balance  between  ci  and  is  not  “symmetric,”  however.  In  particu¬ 
lar,  ev  In  e-ij Nva  as  Nva  —*  oc  for  e2  fixed,  corresponding  to  rather  slow 
algebraic  decay.  For  example,  for  Nva  =  22,  we  can  choose  €\  =  .1 ,  =  .1; 

doubling  the  number  of  evaluations  to  Nva  =  44  reduces  by  only  a  factor 
of  two,  £i  =  .05,  £j  =  .1.  In  contrast,  £2  ~  e~e'N'a  as  Nva  — *  oo  for 
fixed,  corresponding  to  rapid  exponential  decay.  For  example,  again  begin¬ 
ning  with  Nva  =  22  and  e*  ~  .l,£j  =  .1,  doubling  the  number  of  evaluations 
to  Nva  =  44  permits  a  tenfold  decrease  in  £j,  £i  —  .l,e2  =  .01.  It  follows 
that,  with  only  a  modest  number  of  evaluations  of  5(g),  £2  will  be  sufficiently 
small  that  we  can  assume  with  near  certainty  that  U  is,  indeed,  of  relative 
weighted  volume  less  than  or  equal  to  the  remaining  uncertainties  are  the 
location  of  U,  and  the  surrogate  misfit,  |5(g)  —  5(g) |,  over  U. 

Remark  on  Dimensionality.  Our  algorithm,  sampling  requirement  (21), 
and  error  estimate  (22)  apply  independent  of  input-vector  dimensionality, 
M.  However,  this  generality  is  deceptive;  as  M  increases,  although  the  rel¬ 
ative  volume  of  U  remains  invariant,  U  will  reflect  increasingly  significant 
excursions  in  individual  components  of  the  input  vector  (e.g,,  compare  the 
side  length  of  a  square  and  cube  of  the  same  volume).  This  implies  that 
our  technique  will  not  be  viable  for  too  many  subproblem  inputs  (although 
the  number  of  design  variables  may  be  large).  Surrogate  techniques  should, 
however,  be  extensible  to  problems  of  shape  optimization  [40],  which  are 
essentially  infinite-dimensional,  if  geometrically  motivated  correlations  be¬ 
tween  the  inputs  are  introduced  in  order  to  reduce  —  through  p( g)  —  the 
effective  volume  of  the  input  domain.  □ 

3.1.3  Eddy-Promoter  Example 

We  apply  the  validation  procedure  to  the  eddy-promoter  Nusselt  number, 
Q{a ,  R ).  In  order  to  evoke  the  MV  Algorithm,  the  prerequisites  listed  in  Sec¬ 
tion  3.1.1  must  be  supplied.  The  necessary  quantities  are,  in  fact,  all  defined 
in  Section  2.3,  save  the  proposed  surrogate,  Q{a,R),  the  importance  func¬ 
tion,  p(p),  the  maximum  number  of  evaluations,  Nev,  and  the  uncertainty 
tolerances,  £x  and  £j.  For  our  purposes  here  we  simply  select  for  the  surro- 
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gate  Q(a ,  R)  =  1,  which  corresponds  to  the  Nusselt  number  for  conduction  in 
the  channel  in  the  absence  of  the  eddy-promoters  and  any  flow.  (This  simple 
surrogate  is  chosen  for  lack  of  a  better  heuristic;  however,  this  example  also 
illustrates  application  of  surrogate  procedures  to  test  global  ‘'stability,”  or 
sensitivity,  of  a  solution  —  in  this  case  the  conduction  solution  —  to  vari¬ 
ations  in  the  design  variables.)  We  take  the  importance  function,  p(g),  to 
be  uniform  over  the  triangular  input  domain,  reflecting  no  prior  knowledge 
as  to  which  parts  of  the  domain  will  prove  more  interesting  in  the  ultimate 
optimization  application.  Lastly,  we  set  N'v  —  44,  €\  =  .l,£j  =  .01  (by 
construction,  Nva  =  Af(e i  =  .l,e2  =  -01)  =  44  =  /V'v). 

Implementation  of  the  MV  Algorithm  is  now  straightforward.  First,  we 
employ  a  standard  acceptance-rejection  Monte-Carlo  method  [14]  and  a  con- 
gruential  pseudorandom  number  generator  [43]  to  produce  Nva  =  44  input 
points  which  are  randomly  and  uniformly  d:stributed  over  the  input  domain 
nE,>;  the  input  points,  {a^,  /?,},  j  =  1, . . . ,  Nva ,  resulting  from  one  realization 
of  this  process  are  shown  in  Figure  4.  Next,  the  Nusselt  number  is  computed 
at  each  of  the  input  points,  q}  =  C(a2,  Rj),  j  =  1, . . . ,  Arvo,  following  the  field 
subproblem  evaluation  procedure  described  in  Section  2.3.  Lastly,  we  com¬ 
pute  =  max;€{i . 7vv.}  j<7>  -£>(<*;, -Ry)|  =  max;£{, . **•«}  \q}  -1|;  for  the 

realization  shown  in  Figure  4,  we  find  =  -236.  We  thus  conclude  that, 

with  confidence  level  greater  than  .99,  the  discrepancy  between  Q(a,  R)  (the 
Navier-Stokes  solution)  and  <2  =  1  (our  simple  surrogate)  is  less  than  .236 
over  more  than  90%  of  QEP.  The  error  in  the  surrogate  is,  expectedly,  rather 
large,  confirming  that  the  flow  departs  significantly  from  conduction  within 
the  design  space  U£P .  To  capture  this  departure  in  greater  detail,  we  need 
to  consider  construction-validation  procedures. 

3.2  Construction-Validation 

3.2.1  Algorithm 

The  construction-validation  algorithm  takes  as  given:  (i)  A  subproblem, 
with  an  input  A/-vector,  £,  an  input  domain,  fl  €  1R A/,  a  (single)  output, 
s  €  R,  and  an  input-output  function,  <S(g)  €  Z,°°(rZ).  (ii)  A  modelling 
(approximation)  procedure, 

^•.(nxiij^rtfl),  (24) 
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which,  given  V  input-output  pairs,  Zj  =  •  •  ■  ■>  (£p^(£p))}'  gen' 

erates  the  surrogate  rule,  S(p).  (iii)  A  Bayesian  importance  function,  p(p), 
satisfying  (20).  (iv)  The  maximum  number  of  S(p)  evaluations  permitted, 
Nev.  (v)  Two  validation  error  tolerances,  £j,£2  €  [0,  l]2.  (vi)  The  Model 
Validation  Algorithm  of  Section  3.1.1. 

We  now  summarize  a  simple  Monte— Carlo  Model  Construction-Validation 
(MCV)  Algorithm  based  on  random  datasets  [2]. 

Algorithm  MCV(«S(p),  p(g),  Nev,e  1,^2) 

1.  COMPUTE  Nva  =  A /"(£!, £2)  from  (21). 

2.  IF  Nva  >  Nev ,  QUIT;  ELSE  SET  Nc^nttructl0n)  =  Nev  -  Nva. 

3.  FOR  j  =  1, . . . ,  Nco  (random  dataset): 

51.  DRAW  P}  ~  p(p) 

52.  COMPUTE  S}  =  S{P}). 

4.  SETS(R)  =  A({(Pl,  Sx),...,  Sk<o)}). 

5.  CALL  MV($(E),  5(g), p(£),  Nvo, £j, £2)  -  ° 

(For  simplicity  of  presentation  we  indicate  that  the  first  Nco  input  points 
serve  for  construction  and  the  last  Nva  input  points  serve  for  validation;  in 
practice,  a  sample  of  N*v  input  points  is  drawn  and  then  rando  ly  parti¬ 
tioned  into  construction  and  validation  subsets.)  The  constructed  model  and 
model  prediction  error  estimator,  £maxi  satisfy  our  probabilistic  validation 
statement,  (22).  Although  the  S(E)  are,  in  fact,  random,  for  the  purposes  of 
this  paper  we  shall  condition  all  results  on  a  given  model  S{p). 

The  MCV  Algorithm  presented  is  rather  crude  and  inefficient.  First,  we 
would  prefer  to  compare  and  select  amongst  different  surrogates,  choosing 
that  model  which  incurs  the  smallest  model  prediction  error  estimate  or 
which  is  computationally  least  expensive  [5,6].  Second,  we  would  like  to 
adapt  to  information  generated  during  the  construction-validation  process, 
a  sequential  approach  offers  clear  advantages,  permitting  the  algorithm 
and  the  appeals  to  the  expensive  S(p)  —  to  terminate  when  the  (or  a)  model 
prediction  error  estimate  is  sufficiently  small.  Both  of  these  improvements 
are  made  possible  by  the  multiple-output  extension  described  in  Section  5. 


3.2.2  Classes  of  Models 

Models  can  be  characterized  in  several  ways:  by  dataset  determinis¬ 
tic  or  random;  or  by  procedure,  A,  “graybox”  or  “blackbox."  Although  the 
modelling  problem  would  appear  to  be  a  routine  exercise,  several  factors  com¬ 
plicate  the  process.  First,  the  domain  fi  will  often  be  irregular,  precluding 
simple  tensor-product  techniques.  Second,  the  input  vector  and  domain,  f? 
may  be  of  high  dimension,  M:  most  complex-geometry  interpolation  proce¬ 
dures  developed  for  partial-differential-equation  applications  in  two  or  three 
space  dimensions  are  increasingly  cumbersome  or  computationally  intensive 
with  increasing  space  dimension;  local,  linearized  models  commonly  used  for 
multivariate  response  surfaces  are  inappropriate  for  our  (global)  purposes. 
Third,  random  datasets  offer  certain  advantages  within  the  surrogate  con¬ 
text:  scattered-data  approximation  procedures  [17]  are  considerably  more 
problematic  than  ordered  datasets. 

We  begin  by  comparing  the  relative  advantages  (marked  with  a  4- )  and 
disadvantages  (marked  with  a  — )  of  deterministic  and  random  datasets. 
First,  deterministic  datasets:  (+)  ensure  the  anticipated  distribution  is  real¬ 
ized;  (+ )  can  exploit  existing  datasets;  (+)  permit  a  range  of  well-developed 
approximation  procedures  (tensor-product,  “finite-element”);  (  +  )  permit  a 
priori  regularity-based  approximation-error  estimates;  (  — )  extend  with  some 
difficulty  to  complex  fi,  in  particular  for  larger  A/;  (  — )  preclude  recycling  of 
datapoints  for  (perforce  random)  validation  (see  [2]  and  Section  5).  Random 
datasets:  (— )  exhibit  fickle  “distribution”;  (— )  disqualify  existing  (determin¬ 
istic)  data;  (— )  permit  only  scattered-data  approximation  methods,  such 
as  Voronoi  methods  [2,44],  modified  Shepard  techniques  [17,45],  and  radial- 
basis-function  approaches  [11];  (-)  permit  only  limited  a  prion  regularity- 
based  approximation  error  estimates;  (  +  )  extend  readily  to  complex  fl;  (  +  ) 
extend  readily  to  larger  A/;  (+)  permit  recycling  for  validation. 

We  also  briefly  compare  graybox  and  blackbox  modelling  approaches. 
Graybox  models  are  intended  to  reflect  prior  information  as  to  the  antici¬ 
pated  form  of  the  phenomenon  under  consideration.  The  resulting  model 
is  “parametric,”  involving  a  finite  number  of  to-be-determined  basis  co¬ 
efficients  representing  a  fixed-dimensional  approximation  space.  Blackbox 
models  are  intended  to  be  largely  unbiased  as  to  possible  functional  form, 
though  clearly  some  minimal  regularity  assumptions  are  required.  Black¬ 
box  models  are  preferrably  “nonparametric,”  permitting  the  approximation 
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of  arbitrary  functions  to  arbitrary  accuracy  by  consideration  of  a  family  of 
approximation  spaces  of  increasingly  large  dimension.  Graybox  and  black¬ 
box  approaches  can  be  gainfully  combined  as  blackbox -corrected  graybox 
models. 

In  [2]  we  develop  a  random-dataset  Voronoi-based  piecewise-constant 
blackbox  approximation  procedure  for  complex  domains  which  extends  di¬ 
rectly  and  efficiently  (linearly  in  complexity)  with  increasing  input  dimen¬ 
sion,  M;  the  technique  is  applied  to  a  problem  of  stress  concentration  in 
linear  elasticity.  Unfortunately,  although  the  Voronoi  method,  which  is  ef¬ 
fectively  a  piecewise  constant  finite  element  approximation  over  convex  tiles, 
does  enjoy  certain  approximation  properties,  the  method  is  too  low-order 
to  make  effective  use  of  the  perforce  limited  datasets  available  for  surrogate 
construction.  In  the  current  paper  we  choose  for  our  construction  procedure 
the  two-dimensional  implementation  of  the  scattered-data  (random  dataset) 
modified  Shepard  method,  A  *-*  QSHEP2D  (ACM  Algorithm  660,  [45]).  At 
present  both  the  Voronoi  and  Shepard  methods  are  “Lagrangian,”  based  only 
on  function  values;  as  numerical  and  automatic  differentiation  techniques 
[46,47]  become  better  developed,  uHermitianTI  approximations  incorporating 
sensitivity  derivatives  may  prove  more  efficient.  General  multivariate  (large- 
M)  approximation  theory  remains  an  open  research  area. 

3.2.3  Eddy-Promoter  Example 

We  now  apply  the  construction-validation  procedure  to  the  eddy-promoter 
flowrate  and  Nusselt  number,  Q(a,  R)  and  Q(a,/?),  respectively.  In  order  to 
evoke  the  MCV  Algorithm,  the  prerequisites  listed  in  Section  3.1.1  must  be 
supplied.  The  necessary  quantities  are  all  defined  in  Section  2.3,  save  the 
approximation  procedure  A,  the  importance  function,  p(jt>),  the  maximum 
number  of  evaluations,  Nev ,  and  the  uncertainty  tolerances.  and  e2.  As 
described  in  Section  3.2.2,  we  use  the  Renka  [45]  implementation  of  the  modi¬ 
fied  Shepard  method  as  our  construction  procedure.  We  take  the  importance 
function,  p(p),  to  be  uniform  over  the  triangular  input  domain,  reflecting 
no  prior  prejudice  as  to  areas  of  potentially  higher  interest.  Lastly,  we  set 
N'v  =  44, £i  =  .1,  and  e2  =  .1  ( Nva  =  Af(e \  —  .l,e2  =  .1)  =  22,  and  thus 
Nco  _  yva.  _  22). 

Implementation  of  the  MCV  Algorithm  is  now  straightforward.  First,  as 
in  the  MV  Algorithm,  we  employ  a  standard  acceptance-rejection  Monte- 
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Carlo  method  and  a  congruential  pseudorandom  number  generator  to  pro¬ 
duce  Nev  =  44  points  which  are  randomly  and  uniformly  distributed  over  the 
input  domain  QEP.  Then,  the  flowrate  and  Nusselt  number  are  computed  at 

each  of  these  input  points,  {<?_,, <?j}  =  R} ),  Q(a;,  Rj)},j  =  1, - Nev , 

following  the  field  subproblem  evaluation  procedure  described  in  Section  2.3. 
Next,  these  Nev  =  44  input-output  pairs  are  randomly  partitioned  into  two 
subsets,  a  construction  set  of  size  Nco  —  22  for  Step  4  of  the  MCV  Al¬ 
gorithm,  and  a  validation  set  of  size  Nva  =  22  for  Step  5  of  the  MCV 
Algorithm.  Finally,  the  surrogates  are  formed  and  tested.  Our  construc¬ 
tion  method  is  slightly  modified  to  include  prior  information:  QSHEP2D  is 
evoked  in  Step  4  not  with  Nco  points,  but  with  Nco  +  2  points.  The  two  ad¬ 
ditional  contributions  comprise  a  ‘‘plane-Poiseuille-flow”  input-output  pair, 
({a.R}  —  {0,0}, g  =  2/3),  ({a,  R}  =  {0,0},?  =  1),  and  a  prior-work  eddy- 
promoter  Tollmien-Schlichting  input-output  pair,  ({a,/?}  =  {.5,  .2},  <7  = 
.311),  ({a,/?}  =  {.5,  -2},q  =  1.12).  (Although  the  Nusselt  number  for  the 
extensively  studied  {a,/?}  =  {.5,. 2}  geometry  [35-37]  can  be  estimated  from 
published  data  [35],  we  prefer  to  exactly  recompute  g  and  q  for  the  precise 
boundary  conditions  of  the  current  paper.) 

For  the  particular  (single)  train-test  realization  of  Figure  5,  we  obtain 
the  flowrate  and  Nusselt  number  surrogates  shown  in  Figure  6,  and  model 
prediction  error  estimates  for  the  flowrate  and  Nusselt  number  of  ftE®^  = 
.035  and  E^*  =  .092,  respectively.  We  thus  conclude  that,  with  confidence 
level  greater  than  .90,  the  discrepancy  between  Q(a,R)  (the  Navier-Stokes 
solution)  and  Q  (our  surrogate)  is  less  than  .035  over  more  than  90%  of  fiEP; 
similarly  (but  not  jointly,  see  below),  with  confidence  level  greater  than  .90, 
the  discrepancy  between  Q(a,  R)  (the  Navier-Stokes  solution)  and  Q  (our 
surrogate)  is  less  than  .092  over  more  than  90%  of  QEP.  Discussing  first  the 
flowrate,  we  see  from  Figure  6  that,  not  surprisingly,  for  our  fixed  pressure 
gradient,  the  flowrate  decreases  for  cylinders  either  farther  away  from  the 
wall  or  of  larger  radius:  both  of  these  variations  increase  the  drag  on  the 
eddy-promoter.  The  flowrate  surrogate  is  rather  accurate,  with  a  model 
prediction  error  estimate  of  only  .035  over  an  observed  flowrate  range  of  0  < 
g  <  .667.  Turning  now  to  the  Nusselt  number,  we  see  that  the  largest  Nusselt 
number  obtains  for  the  larger-cylinder  steady  wavy  mechanism,  but  that 
significant  transport  also  occurs  for  the  unsteady  Tollmien-Schlichting  mode. 
The  Nusselt  number  surrogate  is  less  accurate  than  the  flowrate  surrogate, 
with  a  model  prediction  error  estimate  of  .092  over  an  observ<ad  Nusselt 


23 


number  range  of  .764  <  a  <  1.186. 

The  reader  will  notice  that  the  input  points  are  the  same  for  the  validation 
example  of  Section  3.1.3  (see  Figure  4)  and  construction-validation  example 
(see  Figure  5)  of  the  current  section;  furthermore,  the  flov/rate  and  Nusselt 
number  are  both  validated  on  the  same  set  of  input  points.  In  essence,  we  are 
exploiting  the  databoard  concept  (see  [2]  and  Section  5),  in  which  a  single 
set  of  input  points  is  recycled  for  different  models,  outputs,  or  optimization 
studies.  Note  however,  that,  in  the  current  single-output  context  (K  -  1). 
each  example  must  be  treated  as  a  separate  problem  —  the  sample-size 
requirement  (21)  and  associated  error  estimates  (22)  are  not  joint.  In  Section 
5  we  describe  the  simple  modification  which  permits  us  to  state  joint  error 
estimates  for  multiple  outputs  validated  over  a  common  input  set. 

4  Optimization  Purposiveness 

4.1  Surrogate-Based  Optimization 

We  recall  from  Section  2.4  that  the  essential  aspect  of  surrogate  optimization 
is  the  replacement  of  the  actual  objective  function,  4>(£,  A)  =  <?(£(p);  p;  A), 
with  a  surrogate  objective  function,  $(p.  A)  =  <?(< £(p);p;A).  Within  this 
broad  framework,  however,  several  different  approaches  are  possible.  First, 
one  can  proceed  in  an  “unvalidated  mode,''  in  which  one  tests  surrogate  pre¬ 
dictions  only  at  surrogate-proposed  design  points.  This  approach  has  the 
advantage  that  all  points  are  dedicated  to  construction,  but  the  disadvan¬ 
tages  that:  first,  even  if  the  surrogate  is  accurate  at  the  proposed  design 
point,  one  has  no  assurances  as  to  the  accuracy  of  the  surrogate  at  other 
input  points  upon  which  selection  of  the  design  point  may  be  conditioned 
(e.g.,  through  gradient  information  or  simple  rejection);  second,  if  the  surro¬ 
gate  is  not  sufficiently  accurate  at  the  proposed  design  point,  an  appropriate 
course  of  action  is  not  clear.  A  second  approach,  “learn-by-doing”  (e.g., 
[48]),  performs  construction-validation  dunng  the  optimization  process;  that 
is,  the  surrogate  model  input  sample  reflects  the  local  structure  of  the  ob¬ 
jective  function.  The  learn-by-doing  approach  has  the  advantage  that  the 
importance  function  is,  perforce,  relevant  to  the  local  search  process,  but,  the 
disadvantage  that  the  surrogate  developed  for  one  objective  function  may 
be  inappropriate  for  a  subsequent  optimization  study  in  which  the  objective 


24 


function  has  been  modified. 

We  pursue  here  a  third  approach,  a  “Bayesian  validated”  approach,  in 
which  the  validation  importance  function,  p(p),  serves  to  indicate  the  antici¬ 
pated  relative  relevance  of  points  within  the  feasible  design  space  ft;  ideally, 
in  a  parent  optimization  project  involving  numerous  optimization  studies 
parametrized  by  A,  p'( X)  would  be  “distributed”  according  to  />(£').  It  is 
critical  to  note  that  we  rely  on  this  definition  to  motivate,  but  not  to  jus¬ 
tify,  the  choice  of  p(p);  techniques  for  determining  the  de  facto  influence  of 
p(jj)  on  any  single  optimization  study  (that  is,  particular  A)  are  presented 
in  Section  4.3.  The  Bayesian  validated  approach  permits  better  a  posteri¬ 
ori  error  estimates  than  the  “unvalidated  mode,"  though  at  the  expense  of 
fewer  points  for  construction.  (In  fact,  validation  points  can.  subsequent  to 
validation  in  Step  5  of  the  MCV  Algorithm,  be  included  in  a  revised  con¬ 
struction,  however  the  resulting  model  no  longer  satisfies  a  rigorous  error 
statement.)  The  Bayesian  validated  approach  ensures  greater  flexibility  than 
the  “Iearn-by-doing”  approach,  though  at  the  cost  of  lower  relevance  for  any 
particular  study.  In  summary,  the  Bayesian  validated  approach  is  probably 
better  suited  for  initial,  global  studies  than  for  final,  local  designs. 

4.2  Monte  Carlo  Algorithm 

Our  surrogate-based  optimization  algorithm  takes  as  given:  (i)  A  subprob¬ 
lem,  with  an  input  A/-vector,  p,  an  input  domain,  ft  €  RM ,  a  (now  possibly 
multiple)  output,  s  €  IRA.  and  an  input-output  function,  £(£>)  €  £°°(ft)A. 
(ii)  A  modelling  (approximation)  procedure,  *4(— 5  )<  as  described  in  Section 
3.2.  (iii)  An  optimization  evaluation  procedure,  A)  :  ]RK  x  RM  x  A  — ► 

R.  (iv)  A  Bayesian  importance  function,  p(g),  satisfying  (20).  (v)  The  max¬ 
imum  number  of  £(g)  evaluations  permitted,  Nev .  (vi)  Two  validation  error 
tolerances,  £i,£j  €  [0, 1]J. 

We  now  summarize  the  Surrogate-Based  Optimization  (SBO)  Algorithm. 
Algorithm  SBO(«£(s;  p;  X ),p(p),  Nev,eue2) 

1.  COMPUTE  Nua  =  Af(eue2)  from  (21). 

2.  IF  Nm  >  Nev  QUIT;  ELSE  SET  Nco  =  N'v  -  Nua. 

3.  FOR  j  =  1, . . . ,  /Vev: 

SI.  DRAW  P}  ~p(g) 
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S2.  COMPUTE  Sj  =  £(£,)• 

4.  SET  l(R)  =  A(  { (£1  >  )> ... ,  (£*« ,  ) } ). 

5.  FOR  A  =  A1, A2, A3,... 

{  6.  FIND  l^nU)  and  p’(A),  where 

^n(A)  =  MiR  (A));£'(A);  A)  <  *(£(£);£;  A)  ve  g  0 .  (25) 

7.  SET  =  maxj€{iv«+i . n Ef,  where 

£*H*(£>;£,;A) -*(£(£,);£, ;A)I- 

8.  COMPUTE  a  posteriori  estimates  (see  Section  4.3). 

9.  CONSIDER  adaptive  refinement  (see  Section  4.4).  }  □ 

(For  simplicity  of  presentation  we  indicate  that  the  first  Neo  input  points 
serve  for  construction  and  the  last  Nva  input  points  serve  for  validation;  in 
practice,  the  sample  of  Nev  input  points  is  randomly  partitioned  into  con¬ 
struction  and  validation  subsets.)  The  $min(A)  and  £*(A)  of  (25)  will  be  de¬ 
noted  the  “surrogate  minimum”  (more  precisely,  the  global  minimum  of  the 
surrogate  objective  function)  and  the  (or  a)  “surrogate  minimizer,”  respec¬ 
tively;  $min(A)  an<*  £*(A)  of  (2)  will  be  referred  to  as  the  “actual  minimum” 
(more  properly,  the  minimum  of  the  actual  objective  function,  $(£,  A))  and 
“actual  minimizer,”  respectively. 

We  make  several  comments  concerning  the  SBO  Algorithm.  First,  the 
Am  of  Step  5  are  selected  based  on  currently  available  information,  including, 
perhaps,  the  results  of  previous  optimization  studies  (that  is,  for  Xn,n  <  m). 
Second,  we  remark  that  the  surrogate  objective  function  is  constructed  only 
indirectly;  that  is,  rather  than  directly  construct  the  objective  function  out¬ 
put  from  A  applied  to  (£,<£(&>(£);£;  A))  pairs,  we  first  construct  surrogates, 
£(p),  for  the  “intermediate  physical  outputs,”  £(p),  in  Step  4,  and  then 
simply  evaluate  <f>(£.(g);  g]  A)  in  Step  6  and  Step  7.  For  example,  for  the 
eddy-promoter  heat  exchanger,  we  construct  (7)  not  directly  from  the  sam¬ 
pled  data,  but  indirectly  from  the  intermediate  surrogates  for  flowrate  and 
Nusselt  number  described  in  Section  3.2.3.  We  prefer  this  two-stage  ap¬ 
proach  to  direct  construction  of  the  objective  function  because:  each  new  A™ 
does  not  require  re-appeal  tc  a  construction  procedure;  we  are  more  likely 
to  have  prior  information  for  the  physical  quantities  than  for  an  artificially 
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synthesized  objective  function.  Third,  we  note  that,  despite  the  indirect 
construction-cum-evaluation  of  the  surrogate  objective  function,  we  directly 
validate  the  surrogate  objective  function  in  Step  7,  that  is,  we  directly  com¬ 
pute  the  errors  in  the  objective  function  rather  than  (less  precisely)  infer 
these  errors  from  the  errors  in  j£(g).  From  Step  7  and  Section  3.1.2  we  know 

Pr {/  _  a  p(£)d£>  l-£i}>l-e3,  (26) 

or,  equivalently, 

?T{JU'p(j>)<tl<£\}  >1  ~£i  ,  (27) 

where  W*  =  {£  6  H  |  |$(p;  A.)  —  4>(p;  A)(  >  These  (effectively  single¬ 

output)  objective  function  prediction  error  estimates  are  required  for  the  a 
posteriori  analysis  described  in  Section  4.3. 

We  close  this  section  by  remarking  that  our  algorithm  is  related  to,  but 
significantly  different  from,  several  other  stochastic  optimization  procedures 
[14].  First,  as  compared  to  the  simplest  random  search  procedure,  in  which 
the  minimum  is  approximated  as  the  minimum  of  a  random  sample,  our 
approach  offers  two  advantages:  by  constructing  and  subsequently  minimiz¬ 
ing  a  surrogate,  we  can  exploit  whatever  prior  information  may  be  available 
for,  and  whatever  continuity  may  be  present  in,  the  objective  function:  the 
surrogate  reveals  internal  error  and  sensitivity  estimates  not  apparent  from 
the  bare  “nodal  values”  used  in  the  random  search  procedure.  Second,  as 
compared  to  multistart  techniques,  in  which  a  random  sample  serves  as  the 
starting  point  for  many  parallel  local  (e.g.,  gradient-based)  minimization 
problems:  the  multistart  technique  shares  the  disadvantages  (but  also  ad¬ 
vantages)  of  direct  insertion  as  regards  each  local  search;  the  probabilistic 
estimates  for  the  multistart  technique,  although  ostensibly  similar  to  our 
validation  statements,  are  in  fact  expressed  in  terms  of  the  (unknown)  size 
of  the  basin  of  attraction  associated  with  the  global  minimum. 

4.3  A  Posteriori  Estimates 

We  discuss  in  this  subsection  what  can  be  said  of  a  definitive  (or  almost 
definitive)  nature  following  surrogate-based  optimization.  Our  goal  is  to  un¬ 
derstand,  for  any  given  single  realization,  the  influence  of  the  selected  impor¬ 
tance  function,  p(p ),  on  the  reliability  of  the  surrogate- based  optimization 
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process;  our  notion  of  success  is  thus  very  different  than  that  adopted  in 
[12],  in  which  the  quality  of  the  surrogate  is  quantified  only  for  the  case  of 
repeated  trials  according  to  the  initially  prescribed  importance  function.  The 
a  posteriori  estimates  of  Step  8  serve  in  Step  9  first,  to  ascertain  if  the  sur¬ 
rogate  minimum  is  acceptable,  and  second,  if  the  surrogate  minimum  is  not 
acceptable,  to  guide  subsequent  adaptive  improvement  efforts.  It  is  critical 
to  note  that,  consistent  with  the  notion  of  an  expensive  subproblem,  all  a 
posteriori  estimates  in  Step  8  require  no  new  appeals  to  £(p). 

4.3.1  General  Case 

We  state  our  result,  present  a  brief  formal  derivation,  and  discuss  the  theo¬ 
retical  and  practical  implications.  To  begin,  for  any  r  6  (1, 1/ej),  we  define 
Xr  to  be  the  set  of  all  closed  domains,  1Z,  in  Q.  for  which  JK  p{p)dp  =  re j. 
We  then  set 

6  =  min  max{$(p)  -  $„„„}  ,  (28) 

flext  — 

ns  =  arg  mn(max{$(£)  -  ^n})  .  (29) 

In  essence,  the  sensitivity  region  7 Zg  is  that  (or  a)  region  of  relative  weighted 
volume  re i  for  which  the  deviation  of  the  surrogate  objective  function  from 
$min  is  minimal;  this  minimal  deviation,  6,  reflects  the  sensitivity  of  the  sur¬ 
rogate  objective  function  to  variations  in  the  design  variables  in  the  vicinity 
of  the  surrogate  minimizer.  We  can  now  express  a  form  of  “lower  semi¬ 
continuity:”  with  probability  greater  than  1  —  e2> 

3 £  €  7^,p0,  such  that  $(p0,A)  <  $min  +  s?  ,  (30) 

where  G7,  the  “predictability  gap,”  is  the  random  variable 

v  =  E^x+6.  (31) 

Note  that  (30)  says  nothing  concerning  the  discrepancy  between  the  surro¬ 
gate  minimum  and  actual  minimum  or  the  surrogate  minimizer  and  actual 
minimizer;  without  further  hypotheses  on  $(p;A)  and  $(p;  A),  no  such  state¬ 
ment  can  be  made.  Condition  (30)  does,  however,  say  something  concerning 
the  reliability  of  the  surrogate  prediction:  within  a  constructable  region.  Us, 
there  exists  a  point  (in  fact,  many  points)  at  which  actual  system  performance 
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is  within  to  of  the  surrogate-predicted  optimum.  Our  underlying  strategy 
is  to  strive  for  a  global  minimum  of  $(p;  A),  but  to  require  reliability  in  the 
surrogate  prediction;  our  approach  is  thus,  at  least  philosophically,  related 
to  the  Taguchi  approach  to  quality  control  [18,19). 

To  derive  (30),  we  first  note  that,  from  (27),  with  probability  greater  than 
or  equal  to  1  —£2,  U*  is  of  relative  weighted  volume  less  than  or  equal  to  £j. 
If  U *  is  of  relative  weighted  volume  less  than  or  equal  to  £x,  then  Itf,  \  U * 
must  be  nonempty,  since  71$  is  of  relative  weighted  volume  strictly  greater 
than  £1  (recall  r  >  1).  Then,  for  any  point  g'  in  Its  \  U* , 

|$(g';A)-$(£-;A)|  =  ^(E';A)-$(g';A)  +  $(g';A)-$(g;A)| 

<  |*(cf;A)  -  *(£';A)I  +  |«(£f;A)  -  *(£*;  A)l  • 

As  |$(g';A)  -♦(£';  A)|  <  3L*  (recall  g'  #14*),  and  |$(g';  A)  -  $(g*;  A)|  <  <5, 
(30-31)  directly  follows,  with  g0  =  g'.  From  this  derivation  it  is  clear  that 
(30)  applies  to  each  Am  of  Step  5  separately,  not  jointly;  however,  the  SBO 
Algorithm  is  readily  extended  such  that  (30)  is  jointly  valid  (see  Section  5). 

In  order  to  illustrate  (30),  we  consider  a  simple  model  problem  with 
M  —  2,  g  €  fi  =  [—1,1]  x  [—1,1],  and  p(g)  =  1/4.  We  presume  that  the 
result  of  the  SBO  Algorithm  is  a  surrogate  objective  function, 

$(g)  =  p2+u^  +  l  (w>l),  (32) 

with  minimum  and  minimizer  $min  =  1  and  p*  =  {0,0},  respectively,  and  pre¬ 
diction  error  estimate  R£^x.  It  is  then  readily  computed  that  8  =  4r£1u.,/7ri 
with  7 Zs  given  by  the  elliptical  region  centered  at  the  origin  with  major  ( p\ ) 
and  minor  (P2)  axes  of  \fl  and  v/J/w,  respectively.  The  predictability  gap  is 
thus  +  4reiu/K.  It  is  clear,  since  we  have  not  even  defined  the 

actual  objective  function,  $(g;  A),  that  this  analysis  is  based  entirely  upon 
appeals  to  the  surrogate.  This  model  problem  is  not,  of  course,  completely 
arbitrary;  the  M-dimensional  generalization  of  the  objective  function  (32)  is 
a  local  representation  of  any  sufficiently  differentiable  objective  function  near 
an  interior  minimizer. 

Turning  now  to  the  implications  of  (30),  we  remark,  first,  that  (30)  quan¬ 
tifies  the  effect  of  a  poorly  selected  p(p ):  we  expect  that  8,  \RS\  and  nr  will 
be  inversely  proportional  to  p(g‘( A)).  Second,  we  understand  the  origin  of 
the  predictability  gap,  to,  as  distinguishable  construction,  i and  vali¬ 
dation,  <5,  contributions.  We  expect  that,  as  Nco  and  Nva  —*  00 (£2  fixed). 
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Emax,6  and  hence  zo  will  tend  to  zero;  furthermore,  for  any  particular  study 
at  finite  Nev,  the  ratio  (  —  'StE^lsx/6  provides  valuable  “construction  versus 
validation”  guidance  for  adaptive  improvement  (see  Section  4.4).  Third,  the 
continuity  statement  (30)  includes  a  notion  of  “p-sensitivity,”  which  we  de¬ 
fine  as  the  sensitivity  of  $(£;  A)  to  variations  in  p.  (We  contrast  ^-sensitivity 
to  “A-sensitivity,”  which  we  define  as  the  sensitivity  of  design  points,  such 
as  $min(A)  and  £*(A),  to  variations  in  A.)  Our  “£-sensitivity"  result  should 
prove  quite  useful  in  preliminary  optimization  studies:  if  rz  is  acceptably 
small,  and  Its  is  acceptably  located,  the  precise  design  point  need  not  be 
specified,  thereby  maintaining  maximal  flexibility  in  the  ensuing  design  pro¬ 
cess.  This  flexibility  is  particularly  important  when  the  optimization  prob¬ 
lem,  (2),  reflects  only  one  subsystem  of  a  larger,  more  complex  endeavor  [49], 

Remark:  Random  Search  Revisited.  It  is  readily  shown  that,  with 
probability  greater  than  1  —  (as  £i  — ►  0),  a  validation  input  point, 
resides  in  Its  (and  hence  Its  \  M*)-  Thus,  even  in  the  worst  case,  in  which 
we  simply  set  £0  of  (30)  to  the  surrogate  approach  reproduces  the  sim¬ 
ple  random-search  result,  and,  additionally,  provides:  valuable  p-sensitivity 
information  through  condition  (30);  convergence  guidance  through  the  quasi- 
convex  analysis  of  the  next  section.  □ 

4.3.2  Quasi-Convex  Case 

We  consider  here  the  case  in  which  ft  is  convex,  and  $(p;  A)  and  4>(p;  A) 
are  quasi-convex  in  the  first  argument.  (A  function  /(£):  ft  — *  2R  is  quasi- 
convex  if:  Va,£t,£2  €  ([0,  l],n,f)),/(a£1  +  (l -a)^)  <  max[/(£t), /(£,)];  or, 
equivalently,  the  level  sets,  {£  €  ft  |  /(£)  <  v5}*  are  convex  [25].)  We  first  state 
our  main  result:  given  a  “separating  value”  random  variable  A  =  2E%MX  +  6 
and  an  associated  random  “buffer  zone,” 

5a  =  {£€ft  >  A}  , 

a  random  “containing  region,"  AC,  can  be  constructed  in  which,  with  prob¬ 
ability  greater  than  1  -  e3,  an  actual  minimizer,  £*,  must  reside;  further¬ 
more,  as  E^x  and  tend  to  zero,  the  region  AC  shrinks  to  p‘,  the  surrogate 
minimizer.  The  AC-construction  depends  only  on  p(£),r,  and  the  geomet¬ 
ric  properties  of  Its  and  B&.  In  this  section  we:  formally  derive  the  AC- 
construction  for  a  general  one-dimensional  (Af  =  1)  optimization  problem; 
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state  the  AC -construction  for  a  particular  two-dimensional  {M  =  2)  model 
optimization  problem;  and  discuss  the  implications.  The  derivation  of  the 
general  M  =  2  AC-construction  is  given  in  [32];  development  of  the  general 
M  >  2  AC -const  ruction,  though  tractable,  is  rather  involved  and  not  yet 
complete.  All  results  presented  are  for  the  case  of  uniform  p(p). 

We  now  proceed  with  the  AC-construction  for  the  general  one-dimensional 
(M  —  1)  quasi-convex  optimization  problem.  We  introduce  the  (perforce 
convex)  regions  fi  =  [pn^Pfi]>  =  IPs^Pth  =  [pSi^a)  u  and 

AC  =  [ P £  —  £j ,P£  -f  £i]  shown  in  Figure  7  (note  P£  are  random  variables); 
for  clarity  of  exposition,  we  assume  that  pj  >  P^,p$  <  Pj£,  and  P £  > 
Pn  +  £ii  <  Pn  ~  £i-  First,  from  (27)  we  know  that,  with  probability 
greater  than  or'equal  to  1  —£2,  U*  is  of  total  length  less  than  or  equal  to 
£j;  note  that  U *  need  not  be  convex.  If  U *  is  of  total  length  less  than  or 
equal  to  eit  then:  there  exists  a  point  p'  in  71$  \  U*\  for  any  point  p'"  in 
ft  \  AC,  there  exists  a  point  p"  in  0a  H  AC  \  U*  for  which  p'"  <  p"  <  p'  or 
p'  <  p"  <  p'".  For  example,  to  prove  the  latter,  take  (say)  p'"  >  P£  +£1,  and 
assume  that  no  point  p"  in  0a  D  AC  \  U*  exists  such  that  p'  <  p"  <  p'";  but 
then,  (PaiP,w)  C  U*,  and  since  p'"  >  P£  +  £|,  we  arrive  at  a  contradiction. 
We  next  claim  that,  for  p'  in  7 Z$  \  U *  and  p"  in  0A  H  AC  \  U* , 


W;k)>^mm  +  E*MX  +  S 

(33) 

and 

$(p";A)>$(p';A). 

(34) 

Inequality  (33)  follows 

from 

=  |*(p-;  A)  -  #(p";A)  +  *(p";  A)  -  *(p";A)I 

>  I *(F;A)  -  *(p";  A) I  -  -  *(p";  A)l , 


and  |*(F;A)  -  *(pw;A)l  >  2 +  6 ,  |*(p";A)  -  *(p";  A)|  <  £^x  (recall 
p"  £  U*).  Inequality  (34)  follows  from  (33)  and  the  results  of  Section  4.3.1. 
Lastly,  the  event  (say)  p'  <  p"  <  p'"  (p'  in  71$  \U* ,  p"  in  0a  n  AC  \  U* ,  p” 
in  ft  \  AC)  implies,  from  the  inequality  (34)  and  quasi-convexity,  4>(p";  A)  < 
max($(p';  A.),  $(pw;  A)],  that  $(p";  A)  can  not  be  greater  than  $(p"';  A);  thus, 
there  exists  a  minimizer  of  $(p;  A)  within  AC.  This  proves  the  desired  result, 
and  constructs  the  requisite  region,  AC. 

We  next  pass  to  the  two-dimensional  case,  and  present  the  AC-construction 
for  the  particular  model  problem  discussed  in  Section  4.3.1:  M  =  2;  p  £  ft  = 
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[—1,1]  *  [ —  1, 1];  p(g)  =  1/4;  surrogate  objective  function  (32);  and  predic¬ 
tion  error  estimate,  R£^x.  As  described  in  Section  4.3.1,  6  =  4reiw/*,  with 
given  by  an  elliptical  region  with  (major(pi),minor(p2))  axes  \fl{  1,  l/u>). 
Following  an  analysis  conceptually  similar  to  —  though  technically  more 
complicated  than  —  our  analysis  for  the  one-dimensional  optimization  prob¬ 
lem,  we  find  that,  as  6  — ►  0,  is  the  exterior  of  an  ellipse-like  (though  not 
exactly  elliptical)  region  of  (major(pi),minor(p2))  axes  y/6^/\  +  2^(1,  1A*>), 
and  RAC  is  an  ellipse-like  region  of  (major(pi),minor(p2))  axes 

>/6{y/l  +  2(  +  2/r[l  +  ^1  +  r/f+2Cl}(l,  1M , 

where  £  =  R£*.v/^~  Note  that,  as  R£*4X  and  5  — ♦  0,  RAC  shrinks  to  £*. 

The  implications  of  our  results  are  clear.  First,  from  the  theoretical  per¬ 
spective,  we  obtain  convergence  of  the  surrogate  minimizer  (and  probably, 
with  convexity,  the  surrogate  minimum)  to  the  actual  minimizer  (actual  min¬ 
imum)  as  Nev  —*  oc.  Second,  from  the  practical  perspective,  we  provide 
a  natural  framework  in  which  to  pursue  search-domain  reduction  strategies 
[50,51]:  an  initial  surrogate-based  optimization  study  over  (1  provides  the  de¬ 
sign  space,  RAC,  for  subsequent  adaptive  improvement;  with  high  confidence, 
RAC  contains  the  requisite  global  minimizer.  This  search-reduction  approach 
is,  in  practice,  hampered  by  the  pessimistically  large  regions  AT  that  result 
from  the  rather  minimal  assumptions  placed  on  $(£;  A)  and  $(g;  A). 

4.4  Adaptive  Improvement 

IF  (i)  on  the  basis  of  the  surrogate  minimum  $min(A)i  the  surrogate  mini¬ 
mizer,  £*(£),  the  surrogate  prediction  error  estimate,  the  predictabil¬ 

ity  gap,  Rg7,  the  sensitivity  region,  Tt$,  and  the  containing  region,  RAC,  the 
surrogate  minimum  and  minimizer  of  Step  6  are  deemed  unacceptable,  AND 
IF  (ii)  further  appeals  to  £(£)  are  permitted  (e.g.,  Nev  reflects  only  part  of 
the  total  resource  allocation  for  the  entire  project,  or  additional  resources 
can  be  renegotiated  given  the  optimization  results  to  date),  THEN  a  pos¬ 
teriori  estimate-based  adaptive  improvement  can  be  pursued.  (Note  that  if 
(i)  is  false,  then  we  simply  declare  success  and  return  to  Step  5  of  the  SBO 
algorithm;  however,  if  (i)  is  true  but  (ii)  is  false,  we  must  admit  defeat.  In 
the  latter  case,  the  a  posteriori  estimates  serve  the  unpopular  but  valuable 
function  of  qualifying  the  surrogate  minimum  and  minimzer.) 
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Two  adaptive  improvement  branches  (from  the  local  bindings  associated 
with  the  parent  SBO  procedure)  can  be  envisioned.  In  the  first  branch, 
we  simply  evoke  one  additional  appeal  to  £(p)  to  determine  actual  system 
performance  at  £  =  £*,  <£(£(£’);£*;  A);  all  existing  information  implicates  £* 
as  a  viable  design  point,  and  this  possibility  therefore  merits  investigation 
before  proceeding  further.  In  the  second  branch,  we  recursively  evoke  a 
second  instantiation  of  the  SBO  Algorithm,  in  which  we  re-appeal  to  the 
databoard  (see  Section  5)  or  expensive  simulation  £(p)  in  order  to:  if  £  — 
BtE^/S  >  1,  devote  additional  input-output  pairs  to  the  validation  of  new 
models  (see  Section  5)  or  to  the  further  construction  of  existing  models;  if 
£  =  %lE*mr/6  <  1,  devote  additional  input-output  pairs  to  the  validation 
of  existing  models.  Re-appeal  to  the  databoard  or  ,£(£)  will  typically  be 
accompanied  by  search-domain  modification,  in  which  we,  say,  focus  p(p)  in, 
or  relocate  fl  to,  the  RAC-neighborhood  of  the  current  surrogate  minimizer. 

Adaptive,  or  multipass,  strategies,  in  which  optimization  information 
feeds  back  to  the  surrogate  hypotheses,  are  clearly  a  necessity.  As  in  ex¬ 
perimental  data  collection  procedures  [7],  all  diagnostic  (here  simulation) 
resources  should  not  be  expended  in  the  first  salvo;  unfortunately,  as  for  ex¬ 
perimental  inquiries,  we  can  hope  to  proffer  only  “rules  of  thumb”  as  to  when 
to  commit  resources.  More  explicit  strategies  for,  and  examples  of,  multipass 
interaction  [49]  will  be  addressed  in  future  papers. 

4.5  Eddy— Promoter  Heat  Exchanger 

We  consider  now  the  eddy-promoter  heat  exchanger  example.  The  objec¬ 
tive  function  is  given  by  (8);  we  choose  p(p)  to  be  uniform  over  fltr;  we  set 
Nev  =  44  and  £i  =  =  .1.  The  flowrate  and  Nusselt  number  interme¬ 

diate  physical  surrogates  are  constructed  as  in  Section  3.2.3  based  on  the 
construction  sample  shown  in  Figure  5.  For  the  definition  vector  we  take 
Aep  =  A**'1  =  {0i  =  10-7,/?2  =  -20,03  =  11,  k  =  2.0}.  Proceeding  to  Step 
6  of  the  SBO  Algorithm,  we  find  (for  this  two-dimensional  case  by  a  sim¬ 
ple  search)  $^n(AEP'1)  =  -23,  £EP*(AEP1)  =  (.40,. 14)  for  the  surrogate  and 
surrogate  minimizer,  respectively;  a  contour  plot  of  the  surrogate  objective 
function  is  shown  in  Figure  8.  From  Step  7  of  the  SBO  Algorithm  we  find,  for 
the  particular  validation  sample  shown  in  Figure  5,  ftE^  =  .0605.  Then, 
from  the  a  posteriori  analysis  of  Step  8,  we  calculate  (for  r  =  2)  6  =  .029,  and 
thus  Ra  =  .090  and  £  =  2.14;  we  show  in  Figure  8  the  region  77$  in  which, 
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from  (30),  with  confidence  level  greater  than  1  -  £2,  we  can  find  a  £EP,£|P, 
such  that  $EP(p^F,  AEP'1)  (actual  system  performance)  <  $££n(AEP,1)+.Q90. 
Step  9,  we  re-appeal  to  the  Navier-Stokes  subproblem  for  £EP  =  pEP*(AEP  l) 
to  obtain  $EP(£EP*(AEP’1);  AEP’1)  =  .253,  quite  close  to  the  value  predicted  by 
the  surrogate  objective  function. 

We  now  return  to  Step  5,  and  reselect  AEP'2  =  {0i  =  10“6,  =  .01,  #3  = 

1.1,  k  =  2.0},  because  (say)  the  optimum  predicted  in  Figure  8  corresponds 
to  a  pump  size  which  is  unexpectedly  large.  Exploiting  the  same  samples 
and  intermediate  physical  surrogates  as  for  the  first  study  (recall  the  result¬ 
ing  estimates  are  not  joint  pending  the  revised  multiple-output  sample-size 
requirement  of  Section  5),  we  find,  in  Step  6  and  Step  7,  $Jj,pn(AEP’2)  = 
.190,  £EP*(AEP’2)  =  (.80,  .74),  and  =  .0611  for  the  surrogate,  surro¬ 

gate  minimizer,  and  prediction  error  estimate,  respectively;  a  contour  plot 
of  the  surrogate  objective  function  is  shown  in  Figure  9.  Pursuing  in  Step 
8  our  a  posteriori  analysis,  we  calculate  (for  r  =  2)  6  =  .022,  and  thus 
Rco  =  .083  and  £  =  2.77;  we  show  in  Figure  9  the  region  Tls  in  which,  from 
(30),  with  confidence  level  greater  than  1  —  £2,  we  can  find  a  £EP,£^P,  such 

that  $EP(£^P,AEP'2)  (actual  system  performance)  <  ^^(A**'2)  +  -083.  Note 
that,  even  if  we  falsely  presume  global  quasi-convexity,  the  region  S/C  in 
which  the  actual  minimizer  must  lie  is,  disappointingly,  essentially  fiEI\  In 
Step  9  we  re-appeal  to  the  Navier-Stokes  subproblem  for  £EP  =  £EP*(AEP‘2)  to 
obtain  $(£EP*(AEP’2);  AEP'2)  =  <194,  again  quite  close  to  the  value  predicted  by 
the  surrogate  objective  function.  It  is  not  surprising  that,  with  the  increased 
(decreased)  penalty  on  pumping  power  (materials  cost),  optimal  performance 
now  occurs  at  a  lower  flowrate  in  the  vicinity  of  the  global  maximum  in  heat 
transfer. 


5  Extensions 

Classification  Procedures.  In  classification  problems  we  search  for  a  re¬ 
gion  of  input  space  in  which  certain  conditions  are  satisfied;  for  example, 
in  the  eddy  promoter  problem,  we  might  be  interested  in  that  region  of 
17EP  in  which  the  heat  transfer  rate,  {2(a,  E),  is  greater  than  a  prescribed 
threshold.  In  such  situations,  it  is  clearly  advantageous  to  replace  (say)  the 
Navier-Stokes  equations  with  a  less  expensive  —  surrogate  —  {0, 1}  char- 
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acteristic  function.  For  noisy  classification  problems,  statistical  prediction 
rules  [9]  or  neural  network  approaches  [11]  prove  effective;  for  deterministic 
simulation-based  classification  problems,  we  develop  in  [2]  a  Boolean  Voronoi 
construction  method  and  a  binomial-tail-statistic  [16]  validation  technique. 
Modelling  and  classification  algorithms  share  much  in  common  as  regards 
both  motivation  and  formulation,  and,  in  the  future,  must  be  combined  in 
a  single  procedure,  in  which  a  model-based  surrogate  objective  function  is 
minimized  over  a  classification-based  feasible  domain. 

The  Databoard.  The  databoard  concept,  developed  in  [2],  permits  in¬ 
vestigations  defined  over  different  input  domains  to  share  data.  The  tech¬ 
nique,  based  on  simple  conditional  samplying  procedures,  is  best  illustrated 
by  an  example.  Consider  the  second  eddy-promoter  heat  exchanger  opti¬ 
mization  problem  considered  in  Section  4.4,  in  which  the  minimum  is  near 
the  {a  =  1.0, R  =  .95}  vertex  of  flEp.  Assume,  however,  that  the  original 
design  space  is  defined  to  be  not  the  triangular  QEP,  but,  rather,  the  square 
domain  ft,EP shown  in  Figure  10.  Upon  minimization,  the  minimizer  will 
no  doubt  reside  on  the  boundary  of  Ofp,  perhaps  prompting  the  investigator 
to  expand  the  design  space  to  (say)  the  full  triangle,  (lEr.  In  performing 
the  subsequent  adaptive  improvement  over  0EP,  the  researcher  can:  recycle 
existing  data  from  UEP;  evoke  new  simulations  only  when  Qfp  is  depleted, 
or  when  the  requested  input  point  lies  in  f lEP  \  Hfp;  post  new  simulations  to 
the  databoard  for  the  benefit  of  future  investigations.  The  concept  is  readily 
expanded  to  permit  rather  general  input  domains  and  both  modelling  and 
classification  studies;  furthermore,  if  raw,  rather  than  processed,  simulation 
data  is  posted  to  the  databoard,  significant  output  flexibility  can  be  achieved. 

Multiple-Output  Validation.  We  discuss  here  the  generalization  of  the 
Model  Validation  Algorithm  (and,  by  obvious  extension,  the  MCV  and  SbO 
Algorithms)  to  the  case  of  multiple  outputs,  K  >  1.  In  particular,  we  con¬ 
sider  the  situation  in  which  we  wish  to  validate  I\  >  1  outputs  (e.g.,  for  the 
eddy-promoter  problem,  the  flowrate  and  the  Nusselt  number)  at  the  same 
sample  input  points,  Pt , . . . ,  Pj^v*, 

EL,  =  max,o}  E)  ,  fc=l,...,A,  (35) 

E)  =  |S*(£)  -  £*(£,)[ ,  j  =  1,  •  •  • ,  Nva  ,  (36) 
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where  5*(g)  and  $*(g)  refer  to  the  klh  component  of  the  £{£)  and  £(£) 
vectors,  respectively.  It  is  readily  shown  [32]  by  multinomial  extension  of  the 
binomial  arguments  described  for  the  single-output  case  that,  if  we  simply 
replace  the  sample-size  requirement  of  Step  1  of  the  MV  Algorithm  with 


xrva  _ 

'  ln(l  —  £i)  ’ 


(37) 


then  the  K  outputs  jointly  satisfy  a  validation  statement, 


Pr 


</, 


{£eni!s*(£)-5*(£)l<^.,,} 


p(s)dE  >  1  -et,k 


1, . . . ,  A'}  >  1  —  £j  .  (38) 


Note  to  arrive  at  the  simple  expression  (37),  certain  quantities  in  the  multi¬ 
nomial  expansion  are  bounded.  However,  the  requirement  (37)  is  often  pes¬ 
simistic  in  practice  not  because  of  the  bounds,  which  are  rather  sharp  for 
K  <  1/fii,  but  because  most  actual  output  vectors  are  better  correlated  than 
the  worst-case  assumption  inherent  in  (37).  The  remarkable  conclusion  from 

(37)  is  that  only  logarithmically  more  simulations  must  be  performed  in  order 
to  jointly  validate  multiple  outputs  over  a  common  input  set;  indeed,  if  (say) 
£2  =  .01,  K  =  100  outputs  require  only  twice  the  sample  size  as  a  single 
output  ( K  =  1).  This  logarithmic  dependence  further  justifies  the  surro¬ 
gate  concept;  if,  to  obtain  joint  estimates,  the  sample  size  grew  linearly  with 
K ,  the  surrogate  approach  would  be  not  too  different  from  direct  insertion 
procedures  as  regards  adaptability. 

We  mention  three  applications  of  the  multiple-output  theory.  First,  mul¬ 
ticriteria  optimization  frameworks  can  now  be  addressed.  Second,  multiple- 
optimization  studies  (A1, A2--.)  can  be  jointly  validated  so  that  confidence 
can  be  assured  not  only  in  the  final  study,  but  in  all  earlier  studies  on  which 
the  final  study  is  conditioned:  by  replacing  Step  1  of  the  SBO  Algorithm 
with  (37),  the  A  variation  in  Step  5  is  now  jointly  justified.  Third,  if  we 
interpret  the  K  outputs  as  the  I\  errors  associated  with  a  single  physical 
output  approximated  by  K  different  models,  (37)  permits  efficient  model- 
optimal  construction  procedures:  we  replace  Step  1  of  the  MCV  Algorithm 
with  (37);  we  test  several  candidate  models  according  to  (35);  we  choose 
the  best  model  based  on  accuracy  or  cost  criteria  [5,6];  we  are  assured,  from 

(38) ,  that  our  rank  ordering  is  significant,  and  that  our  validation  statement 
applies  to  the  particular  model  selected.  Sequential  procedures  can  also  be 
pursued. 
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6  Conclusions 


Surrogate  techniques  constitute  an  attractive  alternative  to  “direct  insertion” 
for  the  incorporation  of  large-scale  simulations  into  engineering  optimiza¬ 
tion  studies.  Simulation  surrogates  provide  direct  resource  control,  support 
flexibility  in  design  objectives  and  specifications,  and  gainfully  accomodate 
prior  information.  Furthermore,  the  particular  construction-validation  pro¬ 
cedures  proposed  here  enjoy  a  posteriori  error  estimates  that  permit  both 
qualification  of  surrogate  results  and  guidance  for  subsequent  adaptive  im¬ 
provement.  Much  additional  work  is  required,  however,  if  the  simulation 
surrogate  framework  is  to  prove  useful  in  engineering  design.  In  particular, 
multipass  adaptive  refinement  strategies  must  be  articulated  for  both  single- 
and  multiple-optimization-study  projects,  with  emphasis  on  construction- 
validation  refinement,  search  domain  reduction  and  relocation,  and  gradu¬ 
ated  deployment  of  resources. 
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Figure  1:  One  periodicity  cell  of  the  eddy-promoter  heat  exchanger. 
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Figure  7:  Regions  ft,  B&,  and  1C  associated  with  one-dimensional  quasi-convex  opti¬ 
mization  problem. 
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Figure  8:  Contour  plot  for  $EP(£EP;  AEP  l)-  The  surrogate  minimum  over  QEP  is  $m1n(AEP'1)  = 
.223;  the  surrogate  maximum  over  f2EP  is  .550;  contours  delineate  level  sets  of  relative 
weighted  volume  .1,  .2, . . . ,  .9.  The  dashed  contour  encloses  WRg  for  r  =  2. 


Figure  9:  Contour  plot  for  <£EP(pEP;  AEP’2).  The  surrogate  minimum  over  f2EP  is  4>EPn(AEP'2)  = 
.190;  the  surrogate  maximum  over  QEP  is  .385;  contours  delineate  level  sets  of  relative 
weighted  volume  .1,  .2, . . . ,  .9.  The  dashed  contour  encloses  9?7£<5  for  r  =  2. 
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